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Principal  Investigator:  Lee,  William  M.  F. 


Proposal  Title:  Multiplex  Quantitative  Histologic  Analysis  of  Human  Breast  Cancer 

Cell  Signaling  and  Cell  Fate 


Introduction 

The  objective  of  this  proposal  was  to  build  an  advanced  platform  for  immunohistological  study  of  breast  cancer 
specimens  that  retrieves  multiplex  quantitative  molecular  information  on  a  cellular  basis.  Three  components  make  up  this 
platform:  (1)  Multiplex  tissue  immunostaining  protocols  for  revealing  structural,  cell-type  and  analyte  antigens  in  the 
same  histologic  section  (the  first  two  types  of  antigens  are  to  help  segment  and  classify  cells,  and  the  last  type  of  antigen 
is  to  reveal  biological  processes/events  and  prognostic/predictive  biomarkers  of  interest);  (2)  imaging  by  multispectral 
microscopy  to  capture  the  information  revealed  by  individual  stains  in  multiplex  combinations;  and  (3)  software 
(FARSIGHT)  for  automated  multispectral  image  analysis  that  (i)  segments  individual  nuclei  and  cells  in  images,  (ii) 
classifies  the  segmented  nuclei/cells  into  cell  types  of  interest  based  on  their  association  with  structural/cell-type  antigen 
staining  and  spatial/  textural  features,  and  (iii)  quantifies  analyte  expression  on  a  cellular  basis  by  associating  analyte 
staining  with  the  segmented  nuclei/cells.  Operation  of  the  platform  was  developed  using  human  breast  cancer  specimens 
with  the  goal  of  quantifying  cytometrically  antigen  staining  of  known  prognostic/predictive  value  (ER,  PR,  HER2)  or 
reporting  biological  events  and  processes  (Ki67,  p-ERK,  p-AKT,  p-S6)  relevant  to  molecularly  targeted  therapeutic 
agents. 

This  project  has  been  in  no-cost  extension  for  the  past  year. 


Task  1.  Develop  robust  protocols  for  multiplex  immunostaining  of  human  breast  cancer  specimens 
We  have  developed  robust  immunostaining  protocols  for  detecting  analyte  antigens  in  paraffin-embedded  human  breast 
cancer  specimens  that  report  on  cell  signaling  events  (p-ERK,  p-AKT,  p-STAT3,  p-S6),  cell  fate  decisions  (Ki-67)  and 
biomarkers  of  prognostic  and  predictive  value  (ER,  PR,  HER2).  Immunohistochemical  (IHC)  staining  for  these  analytes 
using  chromogenic  substrates  (DAB,  diaminobenzidine)  consistently  yielded  the  highest  percentage  of  positive  tumor 
cells  (with  low  background  staining).  Staining  using  fluorescent  reporters  yielded  comparable  results  for  some  of  the 
analytes  (p-ERK,  Ki-67,  HER2),  making  immunofluorescent  (IF)  staining  an  option  when  studying  these  analytes.  For 
other  analytes  (e.g.  p-AKT,  p-STAT3),  however,  IF  staining  yielded  far  fewer  positive  cells,  indicating  that  IF  was 
signficantly  less  sensitive  than  IHC  for  studying  these  analytes.  Structural  and  cell-type  antigens  (epithelial  cytokeratin, 
CK,  E-cadherin,  HER2)  used  for  typing  segmented  nuclei  and  cells  as  breast  carcinoma  cells  are  generally  abundant  and 
equally  well  revealed  by  IHC  or  by  IF  immunostaining.  Putting  these  individual  antigen  stains  together  in  multiplex 
immunostaining  protocols,  we  use  IHC  staining  for  the  single  analyte  stained  on  a  slide  (employing  DAB  chromogen) 
followed  by  IF  staining  for  the  structural/cell-type  antigens  on  the  slide.  The  exception  is  when  HER2  is  our  analyte,  in 
which  case  we  use  IF  staining  for  both  HER2  analyte  and  CK  cell-type/structural  antigen.  To  prevent  nonspecific 
staining  by  the  secondary  anti-mouse  Ig  antibodies  used  in  subsequent  IF  staining  (due  to  cross-binding  to  mouse 
antibodies  applied  previously  during  analyte  IHC  staining),  we  “strip”  off  all  antibodies  following  DAB  staining  by 
incubating  slides  in  a  5%  SDS  solution  at  50°C  for  5  minutes.  If  two  analytes  are  to  be  studied  in  the  same  tumor  slide 
(e.g.  p-ERK+Ki67  or  ER+PR),  we  perform  dual  analyte  IHC  staining  using  DAB  as  the  reporter  for  one  analyte  and  SG 
Blue  as  the  reporter  for  the  other  analyte.  We  stain  for  cell-type/structural  antigens  following  completion  of  IHC  analyte 
immunostaining.  We  found  that  staining  for  phospho-epitopes  (p-ERK,  p-AKT,  p-STAT3)  diminish  in  cut  tumor 
sections  a  month  after  sectioning.  We  have  made  it  a  policy  to  immunostain  slides  for  phosphopeptide  analytes  within 
two  weeks  of  sectioning. 


Task  2.  Optimize  multispectral  imaging  and  data  capture  for  subsequent  computational  analysis 
We  have  tested  various  image  capture  conditions  and  parameters  to  determine  the  optimal  for  acquiring  images  for 
accurate  analyte  quantification  and  for  adequate  sampling  of  breast  cancer  specimens.  Imaging  is  performed  using 
multispectral  imagers  sold  by  Cambridge  Research  Instrumentation  (CRi,  Woburn,  MA).  Attached  to  an  epifluorescence 
microscope,  these  capture  photons  from  420nm-720nm  wavelength  in  both  brightfield  and  fluorescent  modes  to  record 
chromogen  and  fluorochrome  staining  in  brightfield  and  fluorescent  “data  cubes”,  respectively.  We  image  at  400X 
magnification,  which  usually  acquires  150-300  breast  cancer  cells  for  analysis  in  each  image.  We  usually  acquire  10 
images  from  each  specimen,  so  1500-3000  tumor  cells  are  analyzed.  In  the  last  year  of  this  project,  we  began  to  use  the 
Vectra  Multispectral  Imaging  System  (CRi)  which  includes  a  multispectral  microscope  equipped  with  a  robotic  slide 
loader  and  a  computer-controlled  stage.  It  includes  Inform  software  (CRi)  which  allows  automated  image  acquisition  at 
200X  magnification  selected  from  regions  of  interest  (ROI)  determined  at  40X  magnification.  The  software  has  the 
ability  to  identify  regions  of  breast  carcinoma  cells  in  hematoxylin-stained  images,  and  this  is  used  to  select  the  ROI  at 


low  magmncauon  ror  sampling  at  mgner  magnmcation.  we  win  incorporate  mis  imaging  system  into  our  analysis 
platform  as  it  offers  the  potential  to  type  cells  as  breast  carcinoma  cells  without  immunostaining.  Following  image 
acquisition.  Nuance  software  (CRi)  “unmixes”  the  image  data  cubes  into  component  stains  (“channels”)  using  the  pure 
spectra  of  the  individual  component  stains.  Currently,  we  can  separate  staining  by  two  chromogens  and  up  to  five 
fluorochromes  in  the  same  space  into  distinct  channels. 


Task  3.  Develop  computational  algorithms  for  multispectral  immunohistological  image  analysis 
FARSIGHT  software  was  developed  to  quantify  intrinsic  and  associative  parameters  associated  with  cells  segmented 
from  multispectral  image  data.  Starting  with  multispectral  images  of  multiplexed  immunostained  slides  that  have  been 
unmixed  into  individual  stain  channels  as  described  above,  the  software  is  able  to  quantify  key  breast  cancer  molecular 
biomarkers  (ER,  PR,  Ki67,  and  HER2/neu)  on  a  cell-by-cell  basis.  Initially,  each  nucleus  in  an  image  is  segmented  using 
data  in  the  hematoxylin  (or  DAPI)  channel.  Cell  identification/enumeration  and  all  subsequent  steps  in  cytometric 
analysis  depend  on  accurate  nuclear  segmentation,  and  Publication  #1  {Trans.  Biomed.  Eng.  57:841-852  [2010]) 
describes  development  of  our  approach  for  obtaining  accurate  nuclear  segmentation.  Subsequent  to  nuclear 
segmentation,  whole  tumor  cells  are  segmented  based  on  the  delineated  nuclei,  and  cells  classified  into  cell-types  of 
interest  based  on  their  association  with  immunostaining  for  cell-type  antigens.  The  intra-cellular  distribution  of 
molecular  biomarkers  and/or  analytes  is  then  determined  for  the  cell  types  of  interest  and  quantified  for  each  cell. 
Publication  #2  ( Histopathology  -  accepted  with  revisions)  details  the  development  of  FARSIGHT  for  whole  cell 
segmentation,  cell  classification  and  analyte  association.  Farsight  is  currently  being  developed  to  analyze  many  images  at 
a  time  without  human  intervention/input  (batch  image  processing). 


Task  4.  Validate  immunohistological  analysis  system  and  data  obtained 

With  FARSIGHT  developed  for  segmenting  nuclei  and  whole  breast  cancer  cells  in  appropriately  stained  histological 
images  and  able  to  quantify  ER,  PR  and  HER2  expression,  we  are  in  the  process  of  comparing  FARSIGHT 
quantification  of  these  biomarkers  in  breast  cancer  specimens  with  quantification  performed  by  expert  pathologists 
(validation  studies). 

Please  note  that  FARSIGHT  software  has  been  made  freely  available  for  use  by  the  scientific  community  and  is  not 
patented. 

Key  Research  Accomplishments 

We  have  developed  robust  tissue  immunostaining  protocols  that  allow  breast  cancer  specimens  to  be  stained  for  various 
biomarkers  and  biological  analytes  using  DAB  chromogen  along  with  histochemical  staining  for  nuclei  (with 
hematoxylin)  and  immunostaining  for  cytosol  (CK)  and  plasma  membrance  (E-Cad)  by  fluorescence. 

We  have  optimized  multispectral  image  capture  of  specimens  immunostained  for  multiple  antigens  and  structures  such 
that  staining  for  each  is  separated  into  distinct  channels. 

We  have  incorporated  the  new  Vectra  Multispectral  Imaging  System  (CRi)  with  its  Inform  software  system  into  our 
analytical  platform. 

We  have  optimized  FARSIGHT  software  for  nuclear  segmentation  based  on  hematoxylin  staining  (see  Publication  #1). 

We  have  developed  FARSIGHT  software  for  whole  cell  segmentation;  the  program  segments  cells  in  images  stained  for 
cytosolic  and/or  plasma  membrane  antigens  and  compartmentalizes  each  into  nuclear  and  extranuclear  compartments  for 
the  purpose  of  subcellular  (nuclear  vs.  extranuclear)  analyte  quantification  and  distribution  (see  Publication  #2). 


Reportable  Outcomes  (Publication  bibliography) 

Publication  #1 

Al-Kofahi  Y,  Lassoued  W,  Lee  WMF,  and  Roysam  B:  Improved  automatic  detection  and 
segmentation  of  cell  nuclei  in  histopathology  images.  Trans.  Biomed.  Eng  57;841-852  (2010). 

Publication  #2 

Al-Kofahi  Y,  Lassoued  W,  Grama  K,  Nath  SK,  Zhu  J,  Oueslati  R,  Feldman  M,  Lee  WMF,  and 
Roysam  B:  Cell-based  quantification  of  molecular  biomarkers  in  histopathology  specimens, 
(revision  submitted  to  Histopathology  -  decision  pending). 
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Conclusions 

Multiple  antigens  reporting  on  prognostic/predictive  biomarkers,  cell  signaling  and  cell  fate  decisions  can  be  stained  for 
on  the  same  slide  in  human  breast  cancer  specimens,  along  with  markers  of  different  subcellular  compartments  in  tumor 
cells. 

Following  separation  of  staining  for  these  analytes  and  subcellular  compartment  markers  into  separate  channels  by 
multispectral  microscopy  and  spectral  unmixing,  image  analysis  can  accurately  and  rapidly  segment  nuclei  and  cells  in 
images  and  associate  analytes  with  the  segmented  cells  and  subcellular  compartments  computationally. 
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Improved  Automatic  Detection  and  Segmentation 
of  Cell  Nuclei  in  Histopathology  Images 

Yousef  Al-Kofahi,  Wiem  Lassoued,  William  Lee,  and  Badrinath  Roysam*,  Senior  Member,  IEEE 


Abstract — Automatic  segmentation  of  cell  nuclei  is  an  essen¬ 
tial  step  in  image  cytometry  and  histometry.  Despite  substan¬ 
tial  progress,  there  is  a  need  to  improve  accuracy,  speed,  level 
of  automation,  and  adaptability  to  new  applications.  This  pa¬ 
per  presents  a  robust  and  accurate  novel  method  for  segmenting 
cell  nuclei  using  a  combination  of  ideas.  The  image  foreground 
is  extracted  automatically  using  a  graph-cuts-based  binarization. 
Next,  nuclear  seed  points  are  detected  by  a  novel  method  com¬ 
bining  multiscale  Laplacian-of-Gaussian  filtering  constrained  by 
distance-map-based  adaptive  scale  selection.  These  points  are  used 
to  perform  an  initial  segmentation  that  is  refined  using  a  second 
graph-cuts-based  algorithm  incorporating  the  method  of  alpha  ex¬ 
pansions  and  graph  coloring  to  reduce  computational  complex¬ 
ity.  Nuclear  segmentation  results  were  manually  validated  over 
25  representative  images  (15  in  vitro  images  and  10  in  vivo  im¬ 
ages,  containing  more  than  7400  nuclei)  drawn  from  diverse  can¬ 
cer  histopathology  studies,  and  four  types  of  segmentation  errors 
were  investigated.  The  overall  accuracy  of  the  proposed  segmenta¬ 
tion  algorithm  exceeded  86%.  The  accuracy  was  found  to  exceed 
94%  when  only  over-  and  undersegmentation  errors  were  con¬ 
sidered.  The  confounding  image  characteristics  that  led  to  most 
detection/segmentation  errors  were  high  cell  density,  high  degree 
of  clustering,  poor  image  contrast  and  noisy  background,  dam¬ 
aged/irregular  nuclei,  and  poor  edge  information.  We  present  an 
efficient  semiautomated  approach  to  editing  automated  segmenta¬ 
tion  results  that  requires  two  mouse  clicks  per  operation. 

Index  Terms — Image  cytometry,  cell  nuclei,  histopathology, 
segmentation. 

I.  Introduction 

THE  GOAL  of  this  study  is  to  develop  efficien  and  ac¬ 
curate  algorithms  for  detecting  and  segmenting  cell  nu¬ 
clei  in  2-D  histological  images.  This  is  commonly  a  firs  step 
to  counting  cells,  quantifying  molecular  markers  (antigens)  of 
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interest  in  healthy  and  pathologic  specimens  [  1],  [2],  and  also  for 
quantifying  aspects  of  normal/diseased  tissue  architecture  [1]. 
The  cell  nuclei  may  be  stained  using  fluorescen  markers  [e.g., 
4',6'-diamidino-2-phenylindole  (DAPI)],  or  with  histochemical 
stains  (e.g.,  hematoxylin).  It  is  important  in  these  applications 
to  be  able  to  detect  the  correct  number  of  cells  with  high  accu¬ 
racy,  and  to  delineate  them  accurately  with  utmost  automation 
and  minimal  human  effort.  It  is  also  helpful  to  be  able  to  eas¬ 
ily  adapt  the  software  algorithms  to  images  of  different  tissues 
captured  under  differing  imaging  conditions. 

Automated  segmentation  of  cell  nuclei  is  now  a  well-studied 
topic  for  which  a  large  number  of  algorithms  have  been  de¬ 
scribed  in  the  literature  [2] — [18],  and  newer  methods  continue 
to  be  investigated.  The  main  challenges  in  segmenting  nuclei 
in  histological,  especially  pathological  tissue  specimens,  result 
from  the  fact  that  the  specimen  is  a  2-D  section  of  a  3-D  tissue 
sample.  The  2-D  sectioning  can  result  in  partially  imaged  nu¬ 
clei,  sectioning  of  nuclei  at  odd  angles,  and  damage  due  to  the 
sectioning  process.  Furthermore,  sections  have  finit  thickness 
resulting  in  overlapping  or  partially  superposed  cells  and  nuclei 
in  planar  images.  The  end  result  of  these  limitations  is  a  set  of 
image  objects  that  differ  considerably  from  the  ideal  of  round 
blob-like  shapes.  Their  sizes  and  shapes  in  images  can  be  irregu¬ 
lar,  and  not  always  indicative  of  their  3-D  reality.  There  is  natural 
variability  among  nuclear  shapes  and  sizes  even  when  they  are 
ideally  sectioned.  With  pathological  samples,  nuclei  can  exhibit 
unnatural  shapes  and  sizes.  Variable  chromatin  texture  is  an¬ 
other  source  of  segmentation  error — highly  textured  nuclei  are 
harder  to  segment,  especially  when  they  are  densely  clustered. 
Separation  of  densely  clustered  cell  nuclei  is  a  long-standing 
problem  in  this  field  The  presence  of  a  large  number  of  nuclei 
in  the  fiel  (especially  whole-slide  images)  necessitates  methods 
that  are  computationally  tractable,  in  addition  to  being  effective. 
Finally,  imaging  noise  in  the  background  regions,  especially  for 
fluorescenc  data,  and  the  presence  of  spectral  unmixing  errors 
in  processed  multispectral  images  results  in  additional  errors. 

Perhaps  the  most  critical  aspect  of  nuclear  segmentation  al¬ 
gorithms  is  the  process  of  detecting  a  set  of  points  in  the  image, 
usually  one  per  cell  nucleus  and  close  to  its  center,  that  are 
variously  referred  to  as  “markers”  or  “seeds.”  These  points  are 
used  by  subsequent  algorithms  to  delineate  the  spatial  extent 
of  each  cell  nucleus.  Indeed,  the  accuracy  of  the  segmentation 
depends  critically  on  the  accuracy  and  reliability  of  the  ini¬ 
tial  seed  points.  Several  approaches  have  been  used  to  detect 
seed  points.  The  early  work  in  this  fiel  [3],  [19]  relied  upon 
the  peaks  of  the  Euclidean  distance  map.  This  method  is  often 
used  in  conjunction  with  the  watershed  algorithm  [9]  due  to  its 
computational  efficien  y  and  ready  availability.  However,  it  has 
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the  widely  acknowledged  disadvantage  of  detecting  too  many 
seeds,  leading  to  over-segmentation.  Some  efforts  at  addressing 
this  limitation  include  filterin  of  seeds  based  on  mutual  prox¬ 
imity  [3],  incorporation  of  additional  cues  such  as  the  image 
intensity  gradient  [9],  and  the  use  of  region  merging  algorithms 
as  a  postprocessing  step  [10],  [11].  Another  technique  is  to 
detect  local  maxima  points  in  the  gray-scale  image  using  the 
h-maxima  transform  [16],  [20].  This  method  was  found  to  be 
overly  sensitive  to  image  texture,  and  resulted  in  overseeding 
with  our  images.  The  Hough  transform  [21]  has  also  been  used 
for  detecting  seed  points  [2],  [6].  This  method  is  practical  for 
nearly  circular  nuclei,  and  requires  excessive  computation.  More 
recently,  the  very  elegant  iterative  radial  voting  algorithm  was 
presented  in  [22],  and  has  been  used  in  several  papers  [14],  [23]. 
This  method  requires  edge  extraction  based  on  gradient  thresh¬ 
olding,  and  a  careful  choice  of  several  parameters  that  proved 
impractical  in  the  automated  pathology  context.  In  [24]  a  reg¬ 
ularized  centroid  transform  was  used.  This  method  only  uses 
the  binarized  image  and  does  not  exploit  additional  cues  present 
in  the  image  intensity  data.  In  [8],  a  gradient  fl  w-tracking  al¬ 
gorithm  was  used.  Like  the  radial  voting  idea,  this  method  is 
conceptually  elegant.  The  difficult  with  this  method  in  our 
experiments  was  the  rough  chromatin  texture  that  produces  in¬ 
accurate  f  ow  values  and/or  directions. 

In  this  paper,  we  present  a  method  that  overcomes  many  of 
the  limitations  of  the  aforementioned  methods.  It  is  based  on 
the  multiscale  Laplacian-of-Gaussian  (LoG)  f  Iter  originally  in¬ 
troduced  by  Lindeberg  [25]  as  a  generic  blob  detection  method. 
Recently,  Byun  et  al.  [4]  used  a  blob  detector  based  on  the  LoG 
filte  at  a  fi  ed  scale  (set  empirically)  to  count  cells  in  reti¬ 
nal  images.  This  method  offers  important  advantages,  including 
computational  efficien  y,  ability  to  exploit  shape  and  intensity 
information,  ease  of  implementation,  especially  the  ability  to 
specify  the  approximate  expected  sizes  of  nuclei,  and  robustness 
to  variations.  Building  upon  this  study,  and  keeping  in  mind  the 
challenges  specifi  to  histopathology  images  noted  earlier,  we 
propose  a  method  combining  the  LoG  filte  with  automatic  and 
adaptive  scale  selection. 

Aside  from  advances  in  seed  detection,  the  fiel  of  auto¬ 
mated  image  analysis  has  also  witnessed  the  emergence  of  a  new 
generation  of  image  segmentation  algorithms.  Notable  among 
these  advances  are  methods  based  on  graph  cuts  [26]-[30]  that 
offer  the  important  advantage  of  computing  globally  optimal 
solutions.  Additional  advances  have  been  reported  across  the 
literature.  Notwithstanding  these  advances,  several  needs  have 
remained.  For  instance,  the  graph-cuts  algorithm  requires  ef¬ 
fective  initialization.  For  this,  we  present  a  method  in  which 
the  results  of  seed  detection  are  processed  by  a  new  generation 
of  fast  clustering  algorithms  to  generate  an  initial  segmentation 
that  is  subsequently  refine  using  the  graph-cuts  segmentation 
algorithm.  Another  important  need  is  to  be  able  to  segment 
large  connected  clusters  of  nuclei  efficientl  and  accurately. 
For  this,  we  introduce  a  novel  segmentation  algorithm  based 
on  automatic  graph  coloring  and  the  method  of  a-expansions. 
Overall,  the  effectiveness  of  the  combination  of  these  methods 
is  demonstrated  on  breast  histopathology  images.  Fig.  1  shows 
a  f  owchart  illustrating  the  main  steps  of  our  method. 


Segmentation 

Output 

Fig.  1 .  Flowchart  outlining  the  main  steps  of  the  proposed  nuclear  segmenta¬ 
tion  algorithm.  The  initial  segmentation  and  refinemen  steps  are  illustrated  in 
Fig.  2.  The  optional  editing  step  is  illustrated  in  Fig.  3. 

II.  Materials  and  Methods 

A.  Histology  and  Nuclear  Staining 

For  the  in  vivo  tissue  examples,  deparaffinize  5  //M  sections 
of  formalin-fi  ed,  paraffin-embedde  human  breast  tissues  were 
rehydrated,  and  stained  with  hematoxylin  (Vector  Laboratories, 
Burlingame,  CA).  For  the  in  vitro  tissue  examples,  6  /iM  sec¬ 
tions  of  OCT  frozen  blocks  of  cultured  K1735  tumor  cells  were 
stained  with  DAPI  (Vector  Laboratories,  Burlingame,  CA). 

B.  Image  Capture 

Images  of  hematoxylin  or  DAPI  stained  histopathology  slides 
were  captured  using  a  Nuance  multispectral  camera  (CRI,  Inc., 
Woburn,  MA)  mounted  on  a  Leica  epifluorescenc  microscope 
(Leica  DMRA2).  Images  were  captured  using  full  resolution  of 
the  Nuance  camera  at  8  bits  of  data  per  pixel  and  with  10  nm 
spectral  widths  from  420  to  720  nm  for  brightfiel  images,  and 
440-480  nm  for  DAPI.  Nuance  software  was  used  to  unmix 
the  chromogens  and  fluorophore  in  the  data  cube  into  a  set 
of  nonoverlapping  channels  based  on  user-provided  reference 
spectra  of  the  pure  chromogens  or  fluorochromes  respectively. 
We  denote  the  raw  spectral  data  cube  collected  by  the  instrument 
I(x,  y,  L),  where  ( x ,  y)  are  spatial  coordinates  of  a  pixel,  and  X 
is  the  wavelength.  The  spectral  unmixing  procedure  results  in 
multiple  nonoverlapping  channels  that  are  denoted  as  follows. 
The  nuclear  channel  is  denoted  In  (x.  y).  This  paper  is  primarily 
concerned  with  the  processing  of  In  (x,  y ). 

C.  Automatic  Image  Binarization 

The  f rst  step  in  nuclear  segmentation  is  to  separate  the 
foreground  pixels  in  the  nuclear  channel  Jjv(ai,y)  from  the 
background  pixels.  Several  approaches  have  been  presented 
in  the  literature,  and  a  survey  on  image  thresholding  methods 
can  be  found  in  [31].  Common  methods  include  histogram- 
based  [32],  clustering-based  [33]— [35],  and  entropy-based  [7] 
algorithms.  More  advanced  techniques  are  based  on  graph- 
cuts  [15]  and  level  set  [36]  algorithms,  but  they  require  good 
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initialization/training.  With  this  in  mind,  we  propose  a  hybrid 
approach  that  starts  with  an  initial  binarization  that  is  subse¬ 
quently  refine  using  the  graph-cuts  algorithm. 

For  the  initial  binarization,  we  compute  the  normalized  im¬ 
age  histogram,  denoted  h(i),  where  i  denotes  the  intensity  of 
a  pixel  in  the  range  {0, . . .  ,/max}-  We  found  that  128  bins 
were  adequate  for  these  histograms.  For  the  specimens  studied 
here,  the  histograms  were  found  to  be  bimodal  as  expected,  and 
are  modeled  well  by  a  mixture  of  two  Poisson  distributions. 
This  modeling  choice  was  supported  by  analysis  of  the  image 
formation  process  [37],  prior  literature  [37],  and  empirical  com¬ 
parison  against  the  more  commonly  used  mixture  of  Gaussians 
model  [34],  [38],  [39].  We  used  the  Poisson-distribution-based 
minimum  error  thresholding  algorithm  [38],  [39].  The  normal- 
ized  image  histogram  for  the  mixture  of  Poisson  distributions  is 
written  as 


h{i)  =  P0  x  p(i\0)  +  P±  x  p{i\l)  (1) 

where  Pq  and  Pi  are  the  a  priori  probabilities  of  the  background 
and  foreground  regions,  and  p(i\j),  j  =  0, 1  are  Poisson  distri¬ 
butions  with  means  /i; .  For  a  threshold  t,  the  Poisson  mixture 
parameters  are  given  by 


written  as  follows: 


V(L(x,y),L(x',y'))  =  y{L(x,  y),  L(x',  y')) 

[IN{x,y)  -  IN(x',y')\ 


x  exp  I  — 


2a2L 


(6) 


where 


v(L(x,y),L(x'  ,y')) 


1,  if  L{x,y)  ±  L(x\y') 
0,  if  L(x,y)  =  L(x',y'). 


The  l7 -term  penalizes  different  labels  for  neighboring  pixels 
when  \In  (x,  y)  —  In  (x\  y')  \<<jl-  hi  our  work,  the  scale  fac¬ 
tor  <tl  is  set  empirically  to  values  in  the  range  20-30  pixels. 
Lower  values  are  used  when  the  image  is  smooth,  and  higher 
values  are  used  when  the  nuclear  chromatin  is  highly  textured. 
We  used  an  implementation  of  the  fast  max-fl  w/min-cut  algo¬ 
rithm  described  by  Boykov  and  Kolmogorov  [27].  The  previous 
method  results  in  accurate  binarization  results.  Fig.  2(B)  pro¬ 
vides  a  visual  example  of  the  binarization  results  for  the  image 
in  Fig.  2(A). 


po(*)  =  £M*),  =  pr^^2ixh(i) 


i= 0 

Im  ax 


pi(t)  =  £  h W’  M*)  =  yjp  £  i  x  M*)-  (2) 

i=t+ 1  1 '  '  i=t+ 1 

The  optimal  threshold  t*  is  chosen  to  minimize  an  error  cri¬ 
terion  [38],  as  follows: 


i=0 

4  a, 


t*  =  arg  min{/r  -  P0  (t) (In  P0  (t)  +  p0  (t)  In  p0  (t)) 

t 


D.  Automatic  Seed  Detection  and  Initial  Segmentation 

The  graph-cuts  binarization  algorithm  extracts  connected 
clusters  of  nuclei  that  must  be  separated  into  individual  nu¬ 
clei.  This  requires  identificatio  of  initial  markers  (a.k.a.  seed 
points)  such  that  there  is  one  marker  per  cell.  For  the  present 
work,  multiscale  LoG  filte  based  approaches  proved  to  be  the 
most  effective.  The  LoG  filte  is  given  by 


LoG(a :,y;a) 


d2G(x,y\a)  d2G(x,y ;  a) 

dx2  dy 2 


(7) 


-Pi  (f)  (In  Pi  (f)  +  /xi(f)ln/xi(f))}  (3) 

where  p,  is  the  mean  intensity  of  the  complete  image.  The  result 
of  thresholding  I^{x,y)  using  t*  is  refine  by  incorporating 
spatial  continuity  constraints.  We  seek  the  pixel  labeling  L(x,y) 
that  minimizes  the  following  energy  function: 

E(L(x,y))  =  ^2  D(L(x,y);IN(x,y)) 

(x,y) 

+  £  £  V{L(x,y),L(x,y))  (4) 

(x,y)  (x',y')eN(x,y) 

where  N(x,  y)  is  a  spatial  neighbor  of  pixel  (x,  y).  The  globally 
optimal  labeling  is  computed  using  the  widely  used  graph-cuts 
algorithm  [26]-[30],  [40],  [41].  The  firs  term  in  (4)  is  the  data 
term  representing  the  cost  of  assigning  a  label  to  a  pixel.  It  has 
two  possible  values  depending  upon  whether  the  foreground  or 
background  model  is  used.  Mathematically,  this  is  written  as 
follows: 

D(L(x,  y)\  IN(x,  y))  =  -  In p(IN(x,y)\j  =  {0,1}).  (5) 

The  second  term  is  the  pixel  continuity  term  that  penalizes 
different  labels  for  neighboring  pixels.  Following  [26],  this  is 


where  er  is  the  scale  value,  and  G{x ,  y ;  a)  is  a  Gaussian  with  0 
mean  and  scale  a.  When  applied  to  an  image  containing  blob¬ 
like  objects,  this  filte  produces  a  scale-selective  peak  response 
at  the  center  of  each  object  with  radius  r  when  r  =  o\f2.  The 
main  advantage  of  this  filte  is  that  the  locations  of  these  peaks 
are  robust  to  the  chromatin  texture  that  has  a  much  smaller  scale 
value  compared  to  the  nuclear  blobs.  The  filterin  results  form  a 
topographic  surface  that  provides  a  basis  for  cell  segmentation. 
In  addition,  as  we  describe  shortly,  it  provides  additional  useful 
information  about  the  boundaries  of  touching  nuclei.  A  direct 
application  of  the  multiscale  LoG  to  images  of  nuclei  would 
be  naive,  since  our  tissue  specimens  contain  a  heterogeneous 
population  of  cell  types  with  different  nuclear  sizes.  For  this, 
we  propose  a  multiscale  LoG  with  automatic  scale  selection, 
as  described  by  Lindeberg  [25].  While  this  multiscale  method 
greatly  improved  upon  the  f  xed-scale  method,  as  expected,  it 
was  nevertheless  inadequate,  as  illustrated  in  Fig.  2(C)  and  (D). 
In  particular,  this  method  fails  over  heterogeneous  clusters  of 
nuclei  with  different  sizes,  and  weak  separating  edges.  In  these 
cases,  it  is  possible  for  clusters  of  2  or  more  small  nuclei  to 
be  detected  falsely  as  a  single  larger  blob  that  may  also  en¬ 
croach  on  smaller  blobs  in  its  vicinity.  Overcoming  this  issue 
requires  a  more  sophisticated  control  over  the  scale  values.  Our 
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Fig.  2.  Illustrating  key  steps  of  the  proposed  nuclear  segmentation  method.  (A)  Nuclear  channel  from  spectral  unmixing.  (B)  Foreground  extraction  results.  Pixels 
marked  yellow  represent  a  large  connected  component.  (C)  Surface  plot  of  the  multiscale  LoG  f  ltering  results  for  a  small  region.  (D)  Initial  segmentation  based 
on  the  LoG.  (E)  Surface  plot  of  the  distance-map-constrained  multiscale  LoG.  (F)  Improved  initial  segmentation  resulting  from  the  distance-constrained  LoG. 
(G)  Color  coding  of  the  yellow  pixels  in  panel  (B).  (H)  Final  segmentation  of  the  image  in  panel  (A).  Panels  (I  and  J),  (K  and  L),  (M  and  N),  (O  and  P),  (Q  and  R),  and 
(S  and  T)  indicate  initial  and  f  nal  segmentation  closeups  taken  from  different  regions  in  the  image  shown  in  panel  (H). 


method  to  achieve  such  control  relies  on  exploiting  shape  and 
size  cues  available  in  the  Euclidean  distance  map  Djv(x,  y)  of 
the  binarized  image  [42],  [43].  Our  method  proceeds  as  fol¬ 
lows.  We  compute  the  response  of  the  scale-normalized  LoG 
filte  LoGnorm  (x,  y;  a)  =  (T2  LoG(x,  y;  a)  at  multiple  scales 
cr  =  [<7min, . . . ,  <rmax]  in  steps  of  1.  Then,  we  use  the  Euclidean 
distance  map  to  constrain  the  maximum  scale  values  when  com¬ 


bining  the  LoG  filterin  results  across  scales  to  compute  a  single 
response  surface  denoted  Rn{x,  y)  as  follows: 

RN(x,y)  =  argmax  {LoGnorm(x,  y;  a)  *  IN(x,  y)}  (8) 

CTMAX] 

where  ctmax  =  max{crmin,min{crmax,2  x  DN(x,y)}}. 
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In  effect,  the  distance  map  constrains  the  maximum  scale 
value  at  each  point.  The  response  Rn{x,  y)  can  be  thought  of 
as  a  topographical  surface  whose  peaks  indicate  centers  of  indi¬ 
vidual  nuclei — these  are  the  seed  points  (nuclear  markers).  We 
identify  the  local  maxima  of  Rn  {x,y),  and  impose  a  minimum 
size  (based  on  the  expected  range  of  nuclear  diameters)  to  fil 
ter  out  irrelevant  minima,  as  described  further  in  the  following 
subsection.  The  effect  of  using  the  distance  map  constraint  is 
illustrated  in  Fig.  2.  For  instance,  panels  (C)  and  (D)  show  a 
surface  plot  of  the  multiscale  LoG  and  the  corresponding  initial 
segmentation  (discussed  shortly),  respectively.  Clearly,  the  cen¬ 
tral  nucleus  is  oversmoothed  and  encroaches  into  its  neighbors. 
The  reason  for  the  encroachment  is  the  use  of  large  <7max  that 
is  needed  to  detect  large  cells  in  other  regions  in  the  image. 
Fig.  2(E)  and  (F),  respectively,  shows  surface  plots  of  Rn(x ,  y) 
and  the  corresponding  initial  segmentation.  It  is  clear  that  the 
accuracy  of  seed  locations  and  the  initial  cells  boundaries  are 
much  improved  by  imposing  the  scale  constraint. 

Using  R]y(x,y)  and  the  seed  points  detected,  as  described 
earlier,  we  compute  an  initial  segmentation  of  the  nuclei,  as  de¬ 
scribed  shortly.  The  classical  approach  used  by  several  authors 
(including  ourselves)  is  based  on  the  watershed  algorithm,  and 
its  many  variants  and  improvements  [2],  [3],  [5],  [9]— [1 1],  [16], 
[44],  [45].  This  method  has  the  advantage  of  speed,  simplicity, 
absence  of  adjustable  parameters  and  a  degree  of  f  exibility  that 
results  from  being  able  to  modify  the  underlying  distance  map. 
The  main  disadvantage  of  this  algorithm  for  the  present  task 
is  its  sensitivity  to  even  minor  peaks  in  the  distance  map  that 
results  in  over  segmentation,  and  detection  of  tiny  regions  as  ob¬ 
jects.  To  address  this  problem,  we  propose  an  alternate  method 
based  on  size-constrained  clustering.  The  use  of  clustering  for 
nuclear/cell  segmentation  is  not  new,  and  predates  the  watershed 
method,  e.g.,  [44]  and  [46].  However,  clustering  methods  have 
been  computationally  expensive  and  difficul  to  scale  to  large 
images.  Recently,  Wu  et  al.  [47]  described  the  local-maximum 
clustering  algorithm  [47]  that  overcomes  the  previous  limita¬ 
tions,  and  paved  the  way  for  the  present  work.  This  algorithm 
has  a  resolution  parameter  r  that  is  used  to  defm  a  search  area, 
denoted  A(x ,  y)  of  size  2 r  x  2 r  around  each  pixel  in  Rn  (x,  y). 
In  a  nutshell,  this  algorithm  uses  the  seed  points  as  cluster  cen¬ 
ters,  and  assigns  each  pixel  in  the  foreground  image  to  these 
centers  to  form  clusters. 

To  illustrate  the  effect  of  varying  the  resolution  parameter  on 
the  clustering  (initial  segmentation  results),  Fig.  3  shows  two 
1-D  examples.  The  shown  curve  consists  of  several  1-D  blobs 
with  different  sizes.  When  using  a  small  resolution  parameter 
(r  =  3)  all  the  three  blobs  are  detected.  The  local  maxima  (seed 
points)  are  indicated  in  dark  red,  and  the  vertical  dashed-red  lines 
separate  the  blobs.  The  direction  and  length  of  the  black  arrows 
indicate  the  assignment  of  each  point  to  its  local  maximum  in  a 
region  (distance  in  1-D)  define  by  the  resolution  parameter.  In 
panel  (B),  we  use  a  larger  value  of  the  resolution  parameter  (r  = 
6).  Therefore,  the  small  blob  (center)  is  bypassed  and  pixels  to 
its  left  are  assigned  to  their  local  maxima  points  to  its  right.  Only 
two  blobs  (with  two  corresponding  seed  points)  were  detected. 
Note  that  the  separating  lines  pass  through  the  minima  between 
the  two  blobs.  These  points  can  be  thought  of  as  inflectio 


Fig.  3.  Illustrating  the  local-maximum  clustering  method.  A  1-D  curve  with 
three  blobs  is  used.  The  blob  in  the  middle  is  very  small  compared  to  the  others. 
Two  values  for  the  resolution  parameter  are  used.  Using  in  panel  (A)  results  in 
detecting  all  three  blobs.  In  panel  (B),  the  use  of  resulted  in  missing  the  small 
blob  and  merging  it  to  the  larger  one  on  the  right.  The  black  arrows  indicate 
the  assignments  of  points  to  their  local  maxima.  The  detected  seed  points  are 
displayed  as  red  dots. 

points,  where  one  inflectio  point  is  present  between  blobs.  In 
2-D  images,  we  have  separating  boundaries  between  2-D  blobs. 

There  are  two  major  advantages  of  using  this  method  over  the 
watershed  method  [48].  First,  the  resolution  parameter  r  pro¬ 
vides  the  ability  to  avoid  forming  small  clusters,  as  was  clearly 
shown  in  the  two  synthetic  examples  of  Fig.  3.  Second,  the  clus¬ 
tering  method  works  on  foreground  points  only,  which  makes  it 
faster.  In  our  experiments,  this  algorithm  was  comparably  fast 
to  the  watershed,  and  often  faster.  In  our  study,  the  parameter 
r  was  set  empirically  in  the  range  of  ft  e  pixels.  Intuitively,  r 
specifie  the  smallest  size  of  the  clusters  that  we  are  willing  to 
accept  for  the  next  stage  of  processing. 

E.  Refinement  of  Initial  Nuclear  Segmentation 
using  a-Expansions  and  Graph  Coloring 

The  segmentation  contours  produced  by  the  aforementioned 
cluster  analysis  are  approximate  because  the  clusters  are  formed 
using  Rff(x,y)  rather  than  the  original  image,  and  therefore 
require  further  refinemen  using  the  image  intensity. 

The  purpose  of  the  refinemen  is  to  enhance  the  initial  con¬ 
tours  between  touching  nuclei  to  better  delineate  the  true  edges 
between  them.  To  meet  this  goal,  three  requirements  are  needed 
in  the  segmentation  refinemen  step.  First,  it  should  preserve  the 
shape  of  the  cell  nucleus  based  on  some  shape  model.  Second, 
there  should  be  some  rules  that  prevent  two  or  more  nuclei  from 
being  merged.  This  happens  if  we  allow  two  nonneighbor  nu¬ 
clei  to  encroach  into  a  third  one  between  them  until  they  merge. 
Third,  given  the  large  number  of  cell  nuclei  found  in  real  im¬ 
ages,  the  refinemen  step  should  allow  multiple  cell  nuclei  need 
to  be  refine  concurrently  for  efficien  y. 

As  with  the  binarization  refinement  this  step  is  also  formu¬ 
lated  as  an  energy  minimization  that  is  solved  using  a  graph-cuts 
algorithm.  However,  the  problem  here  is  more  challenging  since 
we  have  multiple  labels,  where  the  number  of  labels  equals  the 
number  of  cells  in  a  connected  component.  In  the  binary  case, 
the  graph-cuts  method  find  the  global  minima  in  polynomial 
time.  However,  findin  a  multiway  cut,  such  that  the  resulting 
labeling  configuratio  minimizes  the  energy  function  is  known 
to  be  NP-hard.  Boykov  et  al.  [29]  introduced  two  algorithms, 
known  as  a-expansion  and  a  —  (3  swap,  respectively,  that  can 
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efficientl  fin  good  approximate  solutions  to  the  multiway  cut 
to  within  a  known  factor  of  the  global  minimum.  In  this  study, 
the  former  is  used.  In  the  a-expansion  method,  we  formulate  the 
segmentation  as  an  iterative  binary  labeling  problem.  At  each 
iteration,  one  label  is  set  to  an  integer  a,  and  the  rest  of  the 
labels  are  set  to  another  value,  denoted  a,  where  a  ^  a.  Then, 
a  binary  graph-cuts  step  (called  an  expansion)  is  carried  out, 
in  which  pixel  labels  are  allowed  to  change  in  one  direction 
from  a  to  a.  The  border  of  the  cell  labeled  with  a  is  refine 
by  expanding  it  into  its  neighbors  until  the  energy  function  is 
minimized.  In  the  ideal  case,  the  energy  function  will  reach  its 
minimum  when  the  segmentation  contour  delineates  the  true 
nucleus  contour,  at  which  the  gradient  is  maximum.  The  data 
and  smoothness  terms  of  the  energy  function  should  be  cho¬ 
sen  carefully  in  order  to  achieve  that  goal.  For  the  a-expansion 
method  to  work,  the  smoothness  term  denoted  V  has  to  be  a  met¬ 
ric  that  requires  three  conditions  to  hold  [29].  Given  any  three 
pixel  labels  L\,  L3,  and  L3,  the  three  conditions  are  listed  as 
follows: 

1)  V(L1,L2)=0**L1=L2\ 

2)  V(L1,L2)  =  V(L2,L1)>  0; 

3)  V(LuL2)<V(LuL3)  +  V(L3,L2). 

We  used  a  spatially  varying  smoothness  function  similar  to 
the  one  used  in  the  binarization  step 

V(L(x,y),L(x' ,t/))  =  r)(L(x,y),  L(x' ,y')) 

x  exp(-|/jv(a;,2/)  -  IN(x',y') |) 


where 

V(L(x,y),L(x'  ,y')) 


Const,  if  L(x,y)  ±  L(x',y’) 
0,  if  L{x,y)  =  L(x',y'). 


The  previous  smoothness  function  is  reached  when  the  label¬ 
ing  discontinuities  occur  at  the  edges  between  the  nuclei.  The 
data  term  at  each  pixel  depends  on  the  likelihood  of  assigning 
it  to  each  label  (nucleus).  As  mentioned  earlier,  the  LoG  out¬ 
put  profil  of  each  nucleus  is  roughly  similar  to  a  Gaussian.  In 
addition,  the  elliptical  shape  of  the  cell  is  similar  to  that  of  the 
2-D  Gaussian.  Hence,  a  Gaussian  model  is  used  to  represent 
each  cell.  A  maximum  likelihood  method  (MLE)  is  used  to  es¬ 
timate  the  Gaussian  parameters.  The  inputs  to  the  MLE  are  the 
(x,y)  coordinates  of  the  pixels  of  each  nucleus,  weighted  by 
the  pixelwise  LoG  responses.  The  likelihood  for  a  pixel  (x,  y) 
to  be  assigned  to  cell  i  is  G(x,y;  //$,  S.;),  where  fjn  and  £, 
are  the  mean  and  the  covariance  matrix  of  the  ?th  Gaussian, 
respectively. 

Unfortunately,  the  a-expansion  method  is  not  practical  when 
the  number  of  cells  in  a  connected  component  is  large  (>20), 
leading  to  an  excessive  number  of  expansions  that  require  an 
impractical  amount  of  computer  memory  and  time.  To  address 
this  difficult  ,  we  propose  a  novel  method  based  on  graph  col¬ 
oring  that  is  described  next.  We  start  by  noting  that  when  the 
a-expansion  procedure  is  applied  to  an  initially  segmented  cell, 
it  will  only  expand  to  its  neighboring  regions.  This  is  because 
the  expansion  procedure  will  not  assign  a  pixel  to  a  distant  cell. 
Therefore,  we  turn  the  problem  into  using  a  small  number  of 
labels,  with  each  having  a  large  number  of  cells  expanding  in 


parallel.  This  is  achieved  by  using  a  graph  coloring  approach 
similar  to  the  one  used  in  [13],  but  we  differ  in  the  use  a  two- 
level  region  adjacency  graph.  Using  the  initial  segmentation,  we 
build  a  region  adjacency  graph.  Unlike  our  prior  work  [10],  [11], 
we  now  use  a  two-level  adjacency  graph  in  which  a  cell  is  ad¬ 
jacent  to  its  direct  neighbors,  and  to  the  neighbors’  neighbors 
as  well.  The  second  level  of  adjacency  is  added  to  reduce  the 
possibility  that  two  nonneighboring  cells  with  the  same  color 
merge  after  an  expansion.  The  graph  is  then  colored  sequentially 
such  that  no  two  adjacent  cells  have  the  same  color.  Choosing 
the  number  of  colors  is  a  challenge  since  the  well-known  four- 
color  theorem  [49]  does  not  apply  in  our  case  because  of  our 
two-level  structure.  The  problem  of  findin  the  minimal  number 
of  colors  is  nondeterministic  polynomial-time  hard  (NP-hard). 
For  these  reasons,  we  use  a  sequential  coloring  method  that  is 
simple  to  implement,  but  does  not  necessarily  yield  the  smallest 
number  of  colors.  Fig.  2(G)  shows  the  coloring  output  for  the 
initial  segmentation  of  a  connected  component.  This  connected 
component  [also  shown  in  yellow  in  the  binarization,  which  is 
shown  in  Fig.  2(B)]  contains  123  nuclei,  but  only  eight  colors 
are  used. 

The  resulting  colors  are  used  as  labels  for  the  a-expansion 
step.  At  each  iteration,  all  the  nuclei  with  a  given  color  are 
assigned  the  label  a,  while  all  others  are  assigned  a.  Then,  a 
cells  are  expanded  concurrently  into  a  cells.  As  a  result,  just 
a  few  (usually  less  than  10)  expansions  are  needed  regardless 
of  the  much  larger  number  of  cells  in  a  connected  cluster.  The 
smoothness  term  described  earlier  is  a  pixel-level  function,  since 
it  depends  on  the  local  gradient  between  adjacent  pixels,  and 
hence  it  is  not  affected  by  the  grouping  of  cells  based  on  graph 
coloring.  On  the  other  hand,  the  data  term  is  a  function  define 
on  a  cell  level,  since  it  is  based  on  a  Gaussian  model  of  the  cell. 
Therefore,  it  is  modifie  in  order  to  compute  likelihoods  to  be 
assigned  to  groups  (colors)  rather  than  individual  cells.  Suppose 
that  the  number  of  colors  assigned  to  a  connected  component 
with  Nc  cells  is  Nr,  where  _/Vr  <C  Nc.  The  likelihood  that  a  pixel 
(x,  y)  will  be  assigned  the  jth  color  is 

p(L(x,  y)  =  j)  =  max  =  j  j,  j  =  1. 

(9) 

The  corresponding  data  term  that  represents  the  penalty  for 
assigning  pixel  (x,  y)  to  color  j  is 


D{L(x,  y)  =  j;  IN(x,y))  =  -  In p(L(x,y)  =  j).  (10) 


The  segmentation  refinemen  consists  of  multiple  iterations 
of  a-expansion  up  to  a  preset  maximum  number  of  iterations 
(usually  3),  or  until  no  change  in  any  pixel  label  will  reduce  the 
energy  function.  Finally,  the  resulting  objects  are  renumbered 
to  achieve  consistency  with  the  numbers  of  the  initial  objects. 
In  Fig.  2,  panels  (1  and  J),  (K  and  L),  (M  and  N),  (O  and  P), 
(Q  and  R),  and  (S  and  T),  respectively,  represent  initial  and 
refine  segmentation  closeups  taken  from  different  regions  in 
the  image  shown  in  panel  (H)  of  the  same  figure 
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Fig.  4.  Illustrating  the  impact  of  (optional)  seed  editing  on  the  fina  segmenta¬ 
tion.  (A),  (E),  and  (I)  Initial  segmentations  are  shown  in  the  f  rst  column  for  three 
selected  regions.  Yellow  crosses  indicate  locations  of  mouse  clicks  requesting 
pairs  of  segmented  objects  to  be  merged.  (B),  (F),  and  (J)  Results  of  merging  are 
shown  in  the  second  column,  in  addition  to  the  user’s  requests  to  split  objects 
indicated  as  blue  crosses.  The  blue  crosses  are  initial  seeds  locations  for  the 
two  new  objects.  (C),  (G),  and  (K)  Results  of  splitting  are  shown  in  the  third 
column.  (D),  (H),  and  (L)  Final  segmentation  after  refinemen  is  shown  in  the 
fourth  column. 

F.  Efficient  Computer-Assisted  Editing  of  Automated 
Segmentation  Results 

Automatic  segmentation  algorithms  can  provide  fast  and  ac¬ 
curate  segmentation  of  nuclei.  However,  segmentation  errors 
cannot  be  avoided  even  when  using  optimal  parameter  values. 
Hence,  human  interaction  might  be  needed  to  fi  some  the  seg¬ 
mentation  errors  in  order  to  obtain  the  highest  level  of  accuracy. 
Indeed,  human  interaction  should  be  made  minimal  by  tuning 
the  segmentation  parameters  to  reduce  the  number  of  errors.  In 
addition,  the  editing  method  should  be  made  easy  and  fast.  In 
this  study,  two  types  of  errors  (define  in  the  next  section)  can  be 
corrected  using  manual  editing.  The  f  rst  type  is  over  segmen¬ 
tation  and  is  corrected  by  merging  fragments  of  oversegmented 
nuclei.  The  merging  is  performed  on  pairs  of  neighbor  objects 
by  clicking  on  one  point  inside  each  one  of  them.  Fig.  4(A),  (E), 
and  (I)  shows  closeups  of  initial  segmentation  results.  User  se¬ 
lected  points  (using  mouse  clicks)  for  pair  of  objects  that  need  to 
be  merged  are  shown  in  yellow.  The  merging  results  are  shown 
in  Fig.  4(B),  (F),  and  (J). 

The  second  type  of  errors  that  can  be  corrected  is  under  seg¬ 
mentation.  An  undersegmented  object  is  split  into  two  objects 
by  clicking  at  two  points  inside  it.  An  automatic  splitting  method 
is  used  to  draw  an  initial  contour  between  the  two  new  objects. 

The  splitting  method  starts  by  computing  the  approximate 
Euclidian  distances  from  each  point  inside  the  undersegmented 
object  to  the  manually  selected  points.  Then,  the  splitting  is  done 
based  on  the  minimum  of  the  two  distances  at  each  point.  Blue 
crosses  in  Fig.  4(B),  (F),  and  (J)  represent  user  selected  pairs  of 
points  indicating  objects  that  need  to  be  split.  Fig.  4(C),  (G),  and 
(K)  shows  the  automatic  splitting  results.  Changes  to  the  initial 


segmentation  caused  by  editing  are  also  applied  on  all  the  im¬ 
ages  needed  in  the  segmentation  refinemen  step.  As  illustrated 
previously,  the  refinemen  step  uses  a  graph-cuts-based  tech¬ 
nique  (a-expansion),  where  both  the  initial  segmentation  and 
the  LoG  output  [Rn  (x,  y)]  are  needed.  The  editing  methods 
mentioned  earlier  will  update  the  labels  in  the  initial  segmenta¬ 
tion.  On  the  other  hand,  the  LoG  output  /fy  (x,  y)  is  updated  as 
follows.  In  the  case  of  oversegmentation,  the  LoG  output  profil 
of  the  oversegmented  object  is  replaced  by  the  inverted  distance 
map  from  the  center  of  the  new  object  produced  by  merging. 
On  the  other  hand,  the  LoG  output  profil  of  an  oversegmented 
object  is  replaced  by  the  inverted  distance  map  from  the  centers 
of  the  two  new  objects  resulting  from  splitting.  Fig.  4(D),  (H), 
and  (L)  shows  the  fina  segmentation  after  refinement  A  related 
two-mouse-click  based  technique  for  interactive  whole  cell  seg¬ 
mentation  was  presented  in  [50],  where  the  user  segments  one 
cell  at  a  time  by  clicking  on  a  point  at  the  center  of  cell  and 
another  one  on  its  border.  The  image  is  then  transformed  into 
polar  coordinates,  a  dynamic  programming  algorithm  is  used  to 
fin  the  optimal  path  on  the  cell  border  from  left  to  right,  and 
finall  that  path  is  mapped  back  into  Cartesian  coordinates. 

One  drawback  of  the  presented  editing  tool  is  the  need  to 
scan  the  image  visually  to  search  for  segmentation  errors.  This 
can  be  a  time  consuming  task  in  very  large  images  with  large 
numbers  of  cells.  One  possible  solution  is  to  adopt  the  approach 
presented  by  the  same  group  [51]  in  which  a  segmentation  confi 
dence  score  is  computed  for  each  segmented  cell  nucleus  based 
on  some  morphological  an  intensity  based  features.  The  lower 
the  confidenc  score,  the  more  likely  the  segmentation  error  is. 
Then,  segmented  cell  nuclei  are  sorted  based  on  their  confi 
dence  scores  and  the  user  starts  inspecting  those  them  starting 
from  those  with  low  confidenc  values.  Yet  another  approach 
that  is  explored  as  part  of  the  FARSIGHT  project  (www.farsight- 
toolkit.org)  is  to  identify  outliers  (based  on  one  or  more  features) 
to  detect  nuclei  that  require  further  inspection  for  potential  edit¬ 
ing.  In  general,  the  use  of  pattern  analysis  tools  to  guide  the  user 
for  expedited  editing  is  a  topic  of  ongoing  research. 

TIT.  Results  and  Validation 

The  results  of  automated  analysis  for  25  representative  im¬ 
ages  (15  in  vitro  images  and  ten  in  vivo  images,  containing  more 
than  7400  nuclei  in  all)  drawn  from  diverse  cancer  histopathol- 
ogy  studies  were  inspected  and  scored  manually  with  the  goal 
of  developing  a  conservative  assessment  of  the  frequency  and 
types  of  segmentation  errors.  The  manual  scoring  was  recorded 
electronically,  and  a  sample  is  displayed  in  Fig.  5(A).  In  this 
figure  the  type  of  error  is  indicated  as  a  color-coded  dot.  Seeds 
of  correctly  segmented  nuclei  are  displayed  as  green  dots.  Un¬ 
dersegmentation  errors  (i.e.,  a  failure  to  split  a  region  into  the 
correct  number  of  nuclei)  are  indicated  as  dark  blue  dots.  Over¬ 
segmentation  errors  (i.e.,  excessive  splitting)  are  indicated  as 
purple  dots.  In  addition  to  these  standard  types  of  errors,  we 
also  looked  for  encroachment  errors  (indicated  as  yellow  dots) 
that  occur  when  the  automated  algorithms  do  not  correctly 
place  the  boundary  between  a  pair  of  touching  nuclei.  In  other 
words,  it  is  the  error  in  delineating  the  true  border  between  two 
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Fig.  5.  Illustrating  the  results  validation  criteria.  (A)  Segmentation  output 
of  an  image  with  color-coded  seeds  on  each  nucleus  to  identify  whether  it 
is  correctly  segmented  or  the  type  of  segmentation  error.  (B)  Example  of  an 
undersegmentation  error.  (C)  Example  of  oversegmentation  error.  (D)  Example 
of  an  encroachment  error.  (E)  Example  of  a  binarization  error. 


nuclei.  The  last  type  of  segmentation  errors  is  binarization  er¬ 
rors.  This  type  of  error  includes  the  case  of  nuclei  encroaching 
on  their  neighbors,  or  nuclei  encroached  over  by  their  neighbors. 
The  main  difficult  with  quantificatio  of  encroachment  errors 
is  its  innate  subjectivity.  Another  difficult  is  the  acceptance 
threshold.  In  a  strict  sense,  one  could  argue  successfully  that 
every  adjacent  pair  of  nuclei  suffers  from  some  encroachment 
error.  In  our  work,  a  slight  encroachment  of  a  few  pixels  that 
does  not  change  the  nucleus  shape  or  size  significantl  is  not 
considered  as  an  error.  We  only  consider  moderate  to  severe 
encroachment  errors  in  which  the  error  corresponds  to  at  least 
25%  of  the  total  nucleus  area.  Although  this  manual  observation 
is  still  subjective  and  may  vary  from  an  observer  to  another,  it 
can  give  a  good  approximation  of  the  number  of  encroachment 
errors.  Furthermore,  we  also  examined  errors  from  automatic 
binarization  of  the  image  data.  The  binarization  is  visualized  as 
boundaries  overlaid  on  the  image  [see  Fig.  5(A)].  Incorrectly 
binarized  nuclei  are  indicated  with  light  blue  dots  in  Fig.  5(A). 
We  considered  errors  for  which  a  cell  nucleus  or  part  of  it  is 
missed  at  the  binarization  step. 

Finally,  Fig.  6  shows  six  typical  examples  of  seg¬ 
mented  nuclear  images.  All  of  the  scoring  results  are  pro¬ 
vided  to  the  reader  in  the  electronic  supplement  (available: 
www.ecse.rpi.eduA^roysam/TBME-2010-Supp/).  Table  I  sum¬ 
marizes  the  error  analysis  using  25  images.  The  firs  15  in  the 
table  are  in  vitro  images  while  the  last  ten  are  in  vivo.  Overall, 
just  considering  under-  and  oversegmentation  errors  alone,  our 


fully  automated  algorithm  achieved  >94%  accuracy.  These  data 
are  helpful  in  comparing  our  algorithm  to  previously  published 
methods  [9],  [18].  When  encroachment  and  binarization  errors 
are  included,  our  algorithm  showed  an  accuracy  of  more  than 
86%.  The  performance  of  our  algorithm  with  regard  to  over- 
and  undersegmentation  errors  can  be  described  in  terms  of  pre¬ 
cision  and  recall  measures.  Specificall  ,  the  last  two  columns  of 
these  values  are  indicated  in  Table  I.  The  overall  /•’-measure 
(2  x  precision  x  recall)/(precision  +  recall)  for  these  data  is 
0.97.  We  studied  the  performance  of  our  binarization  refine 
ment  step  by  comparing  its  output  with  the  initial  binarization 
using  twenty  2-D  phantom  images  for  which  we  have  ground 
truth  data.  For  each  image,  we  compared  the  percentages  of  in¬ 
correctly  labeled  pixels  before  and  after  binarization  refinemen 
using  graph  cuts,  as  detailed  in  Table  II.  Fig.  7(A)  and  (B)  shows 
a  sample  phantom  image  and  the  corresponding  ground  truth. 
Initial  segmentation  output  is  shown  in  panel  (C),  while  the  re¬ 
finemen  output  is  shown  in  panel  (D).  It  is  clear  that  significan 
improvement  is  achieved  after  applying  graph-cuts  refinement 

Finally,  we  studied  the  complexity  reduction  achieved  using 
graph  coloring  by  comparing  segmentation  processing  times 
with  and  without  graph  coloring  for  15  automatically  created 
phantom  images.  All  the  images  have  the  same  size  (300  x  300), 
with  only  one  connected  component  (cluster  of  nuclei),  and  a 
varying  number  of  nuclei  in  each  cluster  (10-150).  Table  III 
shows  a  summary  of  the  analysis.  Increasing  numbers  of  nu¬ 
clei  in  the  cluster  results  in  rapidly  increasing  processing  time 
when  graph  coloring  is  not  used.  That  is  because  the  num¬ 
ber  of  required  a-expansions  is  equal  to  the  number  of  nuclei 
in  the  cluster.  However,  no  significan  increase  in  processing 
time  is  noted  when  graph  coloring  is  used,  since  the  number  of 
a-expansions  is  equal  to  the  number  of  colors,  which  is  in  the 
range  of  5  to  10  colors.  Three  sample  phantom  images  are  shown 
in  Fig.  8(A)-(C)  containing  10,  70,  and  130  nuclei,  respectively. 
The  segmentation  results  are  shown  as  red  outlines.  A  graphical 
representation  of  the  results  in  Table  III  is  shown  in  Fig.  8(D), 
which  shows  2-D  plots  of  the  number  of  cells  in  the  connected 
component  (cluster)  versus  the  processing  time  for  both  cases. 

IV.  Discussion 

The  present  work  has  built  upon,  integrated,  and  extended 
multiple  recent  advances  in  the  biological  image  analysis  field 
The  resulting  algorithms  have  proved  to  be  extremely  robust  and 
accurate.  In  our  experience,  the  usually  tricky  task  of  choosing 
the  optimal  parameter  settings  for  the  proposed  algorithm  is  both 
simple  as  well  as  intuitive.  When  errors  do  occur,  our  method 
of  editing  the  seeds,  followed  by  segmentation  refinemen  is 
extremely  efficien  in  practice.  It  requires  minimal  effort,  and 
makes  best  use  of  the  human  observer’s  ability  to  discern  com¬ 
plex  patterns,  and  resolve  ambiguities.  The  actual  segmentation 
is  best  carried  out  computationally. 

There  are  several  known  sources  of  the  errors  analyzed  by 
us.  Oversegmentation  usually  happens  when  a  nucleus’  chro¬ 
matin  is  highly  textured  (especially  true  for  large  nuclei)  or 
when  the  nucleus  shape  is  extremely  elongated.  This  is  par¬ 
ticularly  common  with  nuclei  that  deviate  significant!  from  a 
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Fig.  6.  Sample  segmentation  results  of  six  2-D  nuclear  images  including  [(A)  and  (B)]  two  in  vitro  images  and  [(C)-(F)]  four  in  vivo  images. 


blob  shape,  as  is  the  case  with  some  vascular  endothelial  cells. 
Undersegmentation  usually  occurs  when  nuclei  (especially 
small  ones)  are  highly  clustered  with  weak  borders  between 
the  nuclei.  The  causes  of  encroachment  errors  were  much  more 
diverse,  and  most  often  caused  by  weak  object  separation  cues 
in  the  image.  The  types  of  errors  mentioned  earlier  are,  to  some 
extent,  influence  by  the  choice  of  parameter  settings.  This  is 


discussed  further  shortly.  Binarization  errors  were  largely  due 
to  variations  in  the  nuclear  signal  intensity,  specificall  ,  a  weak 
signal  resulted  in  most  binarization  errors. 

A  traditional  difficult  with  automated  algorithms  is  the  ef¬ 
fort  required  to  tune  them  by  selecting  appropriate  parameter 
settings  to  new  images  and  applications.  In  this  regard,  the  algo¬ 
rithms  we  described  are  well-behaved  and  intuitive.  The  main 
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TABLE  I 

Summary  of  Segmentation  Performance  Data  for  25  Sample  Images* 


Image 

ID 

Number 
of  Cells 

Correctly 

Segmented 

Under- 

Segmented 

Over- 

Segmented 

Encroachment 

Errors 

Binarization 

Errors 

Precision 

Recall 

1 

29 

27 

2 

0 

0 

0 

1.00 

0.93 

2 

75 

74 

0 

0 

0 

1 

1.00 

1.00 

3 

154 

145 

2 

2 

2 

3 

0.99 

0.99 

4 

258 

232 

8 

4 

10 

4 

0.98 

0.97 

5 

61 

60 

1 

0 

0 

0 

1.00 

0.98 

6 

98 

84 

6 

2 

2 

4 

0.98 

0.93 

7 

97 

96 

1 

0 

0 

0 

1.00 

0.99 

8 

59 

55 

1 

0 

1 

2 

1.00 

0.98 

9 

53 

47 

2 

1 

0 

3 

0.98 

0.96 

10 

33 

33 

0 

0 

0 

0 

1.00 

1.00 

11 

71 

63 

3 

0 

2 

3 

1.00 

0.95 

12 

34 

34 

0 

0 

0 

0 

1.00 

1.00 

13 

153 

148 

2 

0 

1 

2 

1.00 

0.99 

14 

56 

43 

6 

2 

3 

2 

0.96 

0.88 

15 

156 

143 

6 

2 

3 

2 

0.99 

0.96 

16 

637 

487 

40 

10 

89 

11 

0.98 

0.92 

17 

627 

532 

33 

3 

55 

4 

0.99 

0.94 

18 

375 

317 

12 

11 

9 

26 

0.97 

0.96 

19 

766 

644 

29 

24 

50 

19 

0.96 

0.96 

20 

873 

779 

25 

14 

36 

19 

0.98 

0.97 

21 

641 

531 

33 

17 

49 

11 

0.97 

0.94 

22 

575 

491 

38 

3 

37 

6 

0.99 

0.93 

23 

563 

498 

11 

16 

27 

11 

0.97 

0.98 

24 

608 

512 

32 

12 

40 

12 

0.98 

0.94 

25 

401 

344 

25 

3 

17 

12 

0.99 

0.93 

Total 

7441 

6419 

312 

125 

430 

155 

0.98 

0.95 

Percentage 

86.3% 

4.2% 

1.7% 

5.8% 

2.1% 

*A11  of  the  images  and  segmentation  results  are  available  as  an  electronic  supplement  of  facilitate 
high-resolution  viewing. 


TABLE  II 

Comparison  of  Binarization  Accuracy  Before  and  After 
Graph-Cut  Refinement 


Image  ID 

Error  in  initial  binarization 

(%i 

Error  after  refinement 

<%) 

1 

3.3 

1.4 

2 

5.8 

3.0 

3 

9.0 

4.1 

4 

10.7 

3.8 

5 

13.4 

5.1 

6 

13.4 

6.0 

7 

14.0 

6.2 

8 

13.2 

5.5 

9 

12.8 

6.5 

10 

12.1 

5.6 

11 

12.3 

6.2 

12 

11.5 

5.4 

13 

11.7 

6.3 

14 

12.7 

6.1 

15 

11.2 

5.2 

16 

13.2 

6.2 

17 

11.1 

5.3 

18 

12.4 

5.1 

19 

12.8 

6.2 

20 

13.5 

6.4 

Average 

11.4 

5.3 

parameters  that  must  be  provided  to  the  software  include  the 
minimum  scale  for  the  LoG  filte  crmin,  the  maximum  scale 
value  dmax,  which  defin  the  expected  range  of  sizes  of  the 
nuclei.  In  our  experiments,  we  used  values  in  the  range  of  four 
to  eight  pixels  for  crmin,  and  10-20  pixels  for  crmax.  Although 
our  algorithms  are  multiscale  by  design,  the  choice  of  these 


Fig.  7.  Comparing  initial  and  graph-cut  refine  binarization  results  using  a 
phantom  image  for  which  the  ground  truth  is  known.  (A)  2-D  phantom  image. 
(B)  Binarization  ground  truth.  (C)  Initial  binarization  output.  (D)  Results  of 
binarization  refinemen  using  graph  cuts. 


TABLE  III 

Illustrating  the  Complexity  and  Processing  Time  Reduction 
After  Using  Graph  Coloring 


Image  ID 

True  number 
of  cells 

Number  of 
Detected  cells 

Number  of 
Colors  Used 

Segmentation  Time  (seconds) 

Without  coloring 

With  coloring 

1 

10 

10 

5 

5.047 

4.782 

2 

20 

20 

7 

6.235 

5.094 

3 

30 

30 

6 

8.204 

5.422 

4 

40 

35 

7 

7.422 

5.281 

5 

50 

47 

8 

9.031 

5.828 

6 

60 

58 

9 

15.282 

6.031 

7 

70 

67 

8 

19.188 

6.844 

8 

80 

79 

9 

20.516 

7.003 

9 

90 

93 

8 

20.626 

6.672 

10 

100 

97 

10 

27.204 

7.547 

11 

110 

104 

10 

26.001 

8.625 

12 

120 

113 

8 

32.563 

7.672 

13 

130 

124 

10 

37.267 

8.235 

14 

140 

137 

9 

40.032 

8.328 

15 

150 

142 

9 

43.079 

8.156 

parameters  affects  the  balance  of  over-  and  undersegmentation 
errors  to  a  small  extent.  Between  these  two  parameters,  <rmin 
is  more  influential  Specificall  ,  if  the  value  of  crmjrl  is  much 
smaller  than  the  expected  minimum  size  of  the  nuclei,  then  the 
incidence  of  oversegmentation  increases.  Smaller  values  of  this 
parameter  are  also  needed  to  account  for  small  fragments  of 
nuclei  that  are  characteristic  of  2-D  sections  of  3-D  tissue.  On 
the  other  hand,  if  the  value  of  amax  is  too  low,  oversegmentation 
errors  become  more  prevalent.  An  overly  high  value  of  ermax  is 
much  more  benign  in  nature  because  it  is  used  in  combination 
with  the  distance  map — it  can  result  in  undersegmentation  or 
encroachment  errors  when  exceptionally  large  and  highly  clus¬ 
tered  groups  of  nuclei  are  encountered.  The  clustering  resolution 
parameter  r  was  generally  chosen  in  the  range  of  3-5  pixels, 
and  the  weighting  parameter  cfl  for  the  graph-cuts  segmentation 
algorithm  was  in  the  range  of  20-30. 
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Fig.  8.  Illustrating  the  effect  of  graph  coloring  using  15  phantom  images 
with  the  same  size  and  one  nuclear  cluster,  but  with  different  number  of  nuclei 
(x-axis).  Three  examples  are  shown  in  (A)-(C)  containing  10,  70,  and  130 
nuclei,  respectively.  Detected  seeds  are  shown  as  green  dots  and  nuclear  seg¬ 
mentation  results  are  shown  as  red  outlines.  (D)  Number  of  cell  nuclei  in  the 
cluster  versus  segmentation  processing  time  (without  graph  coloring  in  red  and 
with  graph  coloring  in  blue). 


The  algorithm  described  here  is  incorporated  into  the 
FARSIGHT  toolkit  [51]  that  is  designed  to  analyze  multipa¬ 
rameter  histopathology  images.  This  software  system  and  im¬ 
plementations  of  the  algorithms  reported  here  are  available  to 
interested  colleagues  from  the  corresponding  author. 
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ABSTRACT 


Aims:  A  computer-assisted  technology  for  objective,  cell-based  quantification  of  molecular  biomarkers  in 
specified  cell  types  in  histopathology  specimens  to  advance  current  visual  estimation  or  pixel-level  (rather 
than  cell  based)  quantification  methods. 

Methods:  Tissue  specimens  are  multiplex  immunostained  to  reveal  cell  structures,  cell  type  markers,  and 
analytes,  and  imaged  using  multi-spectral  microscopy.  The  image  data  are  processed  using  novel  software 
that  automatically  delineates  and  types  each  cell  in  the  field,  measures  morphological  features,  and  quantifies 
analytes  in  different  sub-cellular  compartments  of  specified  cells. 

Results:  The  methodology  was  validated  using  cell  blocks  composed  of  differentially  labeled  cultured  cells 
mixed  in  known  proportions,  and  evaluated  on  human  breast  carcinoma  specimens  for  quantifyng  HER2,  ER, 
PR,  Ki67,  p-ERK  and  p-S6.  Automated  cell-level  analyses  closely  matched  human  assessments,  but 
predictably  differed  from  pixel-level  analyses  of  the  same  images. 

Conclusions:  Our  method  reveals  the  type,  distribution,  morphology,  and  biomarker  state  of  each  cell  in  the 
field,  and  allows  multiple  biomarkers  to  be  quantified  over  specified  cell-types,  regardless  of  abundance.  It  is 
ideal  for  studying  specimens  from  patients  in  clinical  trials  of  targeted  therapeutic  agents,  for  investigating 
minority  stromal  cell  subpopulations,  and  phenotypic  characterization  to  personalize  therapy  and  prognosis. 


INTRODUCTION 

Histopathologic  evaluation  of  tissue  samples  is  indispensable  for  cancer  diagnosis,  classification,  and 
management  (1,  2),  and  is  an  important  tool  in  animal-based  research  (3,  4).  Thin  tissue  sections  are  stained 
with  hematoxylin,  eosin  and/or  other  chemical  stains  to  reveal  cell  and  tissue  structures.  Antibody  staining  to 
reveal  specific  molecular  biomarkers  is  increasingly  used  to  improve  cancer  diagnosis  and  classification, 
establish  prognosis,  and  determine  therapy.  Even  as  molecular  biomarkers  play  a  growing  role,  the  scoring  of 
stained  specimens  remains  largely  a  visual  and  subjective  process:  Cells  are  coarsely  scored  as  positive  or 
negative  or  graded  for  degree  of  antigen  staining,  the  percentage  of  positive  cells  is  estimated  visually,  and 
overall  scores  are  arbitrarily  binned/scaled.  This  process  requires  considerable  expertise  and  is  susceptible  to 
inter-observer  variability,  despite  standardization  efforts  (5-13).  The  use  of  rough  composite  score  scales 
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(e.g.  0,  1+,  2+,  3+  staining)  is  tacit  acknowledgement  of  the  inherent  imprecision  and  subjectivity  involved. 

Recently,  computer-automated  methods  have  been  developed  to  quantify  antigen  expression  in  tissue 
images  (14-17),  offering  objectivity,  reproducibility,  and  quantification  on  a  continuous  scale.  Most  operate 
by  measuring  the  number  of  pixels  stained  for  one  or  more  antigen  and  quantifying  co-localization  of  stains. 
They  quantify  at  the  level  of  individual  pixels,  groups  of  pixels  or  image  regions  however,  and  not  at  the 
level  of  individual  cells,  which  are  the  fundamental  units  at  which  many  biological  processes  occur.  This  is 
largely  due  to  the  lack  of  sufficiently  reliable  automated  methods  to  segment  (delineate)  individual  cells, 
identify  subcellular  compartments  within  cells,  and  quantify  biomarkers  within  the  subcellular  regions.  We 
set  forth  an  approach  that  leverages  recent  advances  in  imaging,  image  analysis,  and  pattern  theory  to  enable 
biomarkers  to  be  analyzed  and  quantified  on  a  cell-by-cell  basis,  providing  additional  data  that  cannot  be 
obtained  by  pixel-level  analysis  and  advancing  prior  efforts  (18)  (19).  Our  segmentation  algorithms  are 
capable  of  delineating  sub-cellular  compartments  using  image  cues  and  geometric  constraints.  The 
subcellular  compartment  segmentations  are  consistently  linked,  enabling  correct  analysis  in  situations  that 
challenge  pixel-level  analytical  methods,  e.g.,  multiple  markers  that  are  not  co-localized  but  are  present  in 
the  same  cell.  Importantly,  our  method  explicitly  identifies  cell  types,  permitting  selective  measurement  of 
biomarker  expression  in  cell  sub-populations  regardless  of  their  abundance 
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RESEARCH  DESIGN  &  METHODS 


Tissue  Staining:  De-paraffinized  5  uM  sections  of  formalin-fixed,  paraffin-embedded  human  breast 
tissues  were  treated  with  citric  acid  (pH  =  6)  for  15min  at  90°C  prior  to  staining.  Antibodies  used  for 
immunostaining  included  monoclonal  mouse  anti-human  estrogen  receptor  (ER),  anti-human  progesterone 
receptor  (PR),  anti-human  Ki67,  anti-epithelial  membrane  antigen  (EMA),  rabbit  polyclonal  anti-HER2 
(Dako,  Carpenteria,  CA),  rabbit  anti-phospho(p)-ERK,  anti-phospho(p)-S6  (Cell  Signaling,  Danvers,  MA), 
and  mouse  anti-multi  cytokeratin  (CK)  monoclonal  antibodies  (Vector  laboratories,  Burlingame,  CA).  ER, 
PR,  Ki67,  p-ERK  and  HER2  were  detected  by  immunohistochemistry  (IHC)  using  biotinylated  species- 
specific  secondary  antibodies,  avidin-linked  horseradish  peroxidase  (HRP)  (ABC  Kit)  and  3,3- 
diaminobenzidine  (DAB)  or  SG  blue  (Vector  laboratories)  HRP  chromogen  substrate.  CK,  EMA  and  p-S6 
immunostaining  were  detected  by  fluorescence  using  Zenon  Alexa  Fluor  488  mouse  IgGl  labeling 
(Invitrogen,  Carlsbad,  CA),  fluorescently  labeled  secondary  antibodies  (Invitrogen,  Carlsbad,  CA)  or  the 
ABC  fluorescence  detection  kit.  After  immunostaining,  slides  were  counterstained  with  hematoxylin. 

Individual  slides  were  stained  with  combinations  of  the  above  antibodies  to  reveal  antigens  that  reported 
on  cell  compartments,  cell  type  and  molecular  analytes  in  each  slide.  Multiplex  staining  protocols  were 
developed  to  minimize  or  avoid  the  opportunity  for  nonspecific  staining  by  secondary  antibodies.  Both 
chromogenic  and  fluorescent  reporters  were  frequently  used  on  the  same  slide,  and  only  fluorochromes  that 
could  be  resolved  spectrally  were  used  on  the  same  slide. 

Tissue  imaging:  A  Nuance®  multispectral  camera  (CRI  Inc.,  Woburn,  MA)  on  a  Leica  DMRA2 
epifluorescence  microscope  was  used  to  record  images  at  400x  magnification,  8  bits/pixel  at  lOnrn 
wavelength  intervals  from  420-720  nm  in  both  brightfield  and  fluorescent  modes.  Nuance  software  was  used 
to  spectrally  unmix  the  data  into  distinct  channels  representing  hematoxylin  and  the  individual  chromogens 
and  riuoroehroms  based  on  the  their  pure  spectra. 

Figure  1  shows  a  sample  breast  cancer  specimen.  The  brightfield  image  (panel  A)  reveals  hematoxylin 
staining.  Panel  B  shows  the  hematoxylin  channel,  unmixed  using  its  spectral  signature  (Panel  F),  revealing 
cell  nuclei.  Such  unmixed  channels  are  ideal  for  automated  segmentation  because  they  are  monochrome  and 


Page  5 


often  contain  only  one  type  of  object.  Panel  C  shows  the  channel  corresponding  to  CK  fluorescent  staining, 
which  reveals  the  cytoplasmic  domain  of  cells  of  epithelial  origin.  Panel  D  shows  the  channel  corresponding 
to  HER2  fluorescent  staining,  which  reveals  the  plasma  membrane  of  breast  cancer  cells  expressing  this 
biomarker.  We  use  this  image  as  a  running  example  to  illustrate  the  segmentation  methods  and  process. 

Image  Analysis  Overview:  Our  segmentation  strategy  focuses  on  cells  whose  nuclei  are  visible  in  the 
nuclear  channel  since  they  mark  individual  cells  -  these  are  segmented  first.  Second,  the  cytosolic  boundaries 
of  cells  whose  nuclei  are  detected  are  segmented  based  on  markers  and  geometric  constraints.  The  third  step 
quantifies  cell  and  nuclear  morphologies,  and  measures  biomarker  expression  over  cellular  compartments. 
Using  these  data,  we  identify  cell  types,  classify  cells  as  being  positive/negative  for  antigens,  and  organize 
the  measurements  by  cell  type  and  sub-cellular  compartment. 

1.  Automated  Segmentation  of  Cell  Nuclei:  We  used  our  fully  automated  segmentation  algorithm  (20) 
that  improves  upon  the  prior  literature  (21-35).  Importantly,  it  is  capable  of  automatic  selection  of  parameter 
settings.  It  starts  by  binarizing  the  image  using  the  Graph-Cuts  method  with  automatic  learning  of  foreground 
and  background  intensity  profiles  using  minimum  error  thresholding  (36,  37).  Next,  a  multi-scale  Laplacian 
of  Gaussian  (LoG)  filter,  with  automatic  and  adaptive  scale  selection(20)  is  used  to  identify  nuclear  centers. 
These  points  are  used  to  generate  an  initial  segmentation  (38)  that  is  refined  using  a  multi-label  graph-cuts 
algorithm  with  alpha-expansions(39)  and  graph-coloring  (40).  Figure  2A  shows  sample  automated 
segmentation  results  for  the  image  in  Figure  1  as  red  outlines  overlaid  on  the  nuclear  channel  displayed  in 
grayscale.  The  green  dots  indicate  nuclear  centers  whose  locations  and  identifications  (IDs)  are  used  in 
subsequent  steps.  Given  the  importance  of  this  step,  the  user  is  provided  with  graphical  tools  to  inspect  the 
results  and  correct  any  errors  before  proceeding  to  the  next  step. 

2.  Automated  Delineation  of  Cytoplasmic  Domains:  This  step  generates  the  spatial  mask  for 
associating  cytoplasmic  markers  to  individual  cells  using  a  mix  of  cues  from  cytoplasmic  and  membrane 
markers  and  geometric  constraints.  For  example,  cytokeratins  (CK)  are  found  in  the  intra-cytoplasmic 
cytoskeleton  of  cells  of  epithelial  origin  (e.g.,  carcinoma  cells  in  Figure  1C),  so  they  indicate  cytoplasmic 
domains  of  a  selected  cell  population.  Cytoplasmic  markers  often  highlight  connected  multi-cellular  clusters 
that  must  be  sub-delineated  into  individual  cells  to  permit  cell-by-cell  analysis.  The  cues  for  this  sub- 
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delineation  vary.  Sometimes,  it  is  possible  to  highlight  cell  boundaries  by  staining  for  a  membrane-associated 
antigen,  e.g.,  E-cadherin  or  EMA.  Some  analytes  also  can  highlight  membranes  of  cells,  e.g.  HER2  (Figure 
2D).  However,  membrane  labeling  is  often  unreliable:  HER2  is  not  always  over  expressed,  and  E-cadherin 
expression  can  be  lost  in  some  cancers.  Even  when  good  cytoplasmic  and  membrane-bound  markers  are 
available,  some  ambiguities  arise  due  to  the  fact  that  histopathology  slides  are  sections  of  three-dimensional 
(3-D)  specimens,  and  the  sectioning  plane  cannot  be  planned  accurately.  For  instance,  the  membranes  of 
cells  may  be  visible  but  not  the  nuclei,  or  the  membrane  signal  can  appeal-  over  a  nucleus,  appearing  to  cut 
across  it.  Finally,  cells  within  a  sample  can  show  a  variable  degree  of  staining.  Overall,  cytoplasmic 
segmentation  algorithms  must  be  capable  of  coping  with  variable  cues.  Our  strategy  is  to  avoid  direct 
segmentation  of  the  cytoplasmic/membrane  channels.  Instead,  we  leverage  the  validated  nuclear 
segmentations  and  build  an  adaptive  algorithm  that  exploits  cues  in  the  cytoplasmic  and/or  membrane 
channels,  when  they  are  available,  and  that  defaults  to  geometric  constraints  when  they  are  inadequate.  It 
automatically  switches  between  two  modes  (defined  below)  on  a  cell-by-cell  basis. 

Mode  0:  This  applies  to  cells  with  detectable  cytoplasmic  and/or  membrane  marker.  The  cytoplasmic 

channel  pixels  Ic(x,y )  are  automatically  and  adaptively  binarized  to  separate  the  foreground  and 

background,  using  the  Graph-Cuts  algorithm  (36,  37).  Morphological  opening  and  closing  operators  (radius 
=  3  pixels)  are  used  to  fill  holes.  If  the  membrane  channel,  IM(x,y )  is  available,  the  magnitude  of  its 

smoothed  intensity  gradient,  Ga(x,y)  =  \'VaIM(x,y)\  is  computed  by  convolving  IM  (x,y)  with  the 
derivative  of  a  Gaussian  with  cr  =  1 .25  pixels  (fixed  for  a  given  magnification).  If  the  membrane  channel  is 
unavailable,  we  compute  Ga(x,y)=\'VaIc(x,y)\  instead.  The  cues  from  the  cytoplasmic  and  membrane 
channels  are  integrated  with  geometric  distances  by  computing  a  gradient-enhanced  distance  map  S(x,y), 
with  respect  to  the  segmented  nuclei.  This  is  used  to  compare  the  cue-adjusted  proximity  of  each  pixel  to 
nuclei.  If  d(i,j )  denotes  the  Euclidean  distance  between  neighboring  foreground  pixels  i  =  (xj,yj)  and 

j  =  {xj,yj') ,  the  adjusted  distance  between  them  is  d(i,j)x\Ga(xi,yi)-Ga(xj,yj)\.  The  adjusted 
distance  between  non-neighboring  points  u]  =  (x, ,  y] )  and  Un  =  (xn ,  yn )  is  weighted  by  the  length  of  the 
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shortest  path  (with  8-neighbor  connectivity)  connecting  them.  The  value  at  each  cytoplasmic  foreground 
point  in  S(^x,y^  is  set  to  the  minimum  of  all  the  adjusted  distances  from  (xF,yF)  to  all  the  nuclear 
boundary  points  that  are  connected  by  a  path  over  foreground  points.  Using  the  nuclei  as  the  initial  markers, 
a  marker-controlled  watershed  transform  (41)  is  computed  on  S  (x,  y ) .  This  produces  a  reliable 

segmentation  of  the  cytoplasmic  foreground  into  sub-regions,  with  one  cytoplasmic  region  per  segmented 
nucleus.  Figure  2B  shows  sample  cell  segmentations  of  CK+  cells  using  Mode  0  using  the  gradient 
information  Ga(x,y )  from  the  membrane  channel.  Figures  (3,  5,  6  and  supplementary  Figure  2) 

exemplify  segmentations  without  the  benefit  of  the  membrane  signal. 

Mode  1:  This  is  a  geometric  estimation  that  is  invoked  for  cells  for  which  cytoplasmic  and  membrane 
labels  are  unavailable  (e.g.,  stromal  cells  that  are  CK-).  The  traditional  geometric  approach  based  on  Voronoi 
diagrams(42,  43)  produces  unacceptably  coarse  polygonal  approximations,  so  we  use  the  Hamilton-Jacobi 
Generalized  Voronoi  Diagram  (HJ-GVD)  (44)  that  uses  the  Euclidean  distance  from  segmented  nuclear 
boundaries  instead  of  their  centroids  to  produce  more  refined  estimates.  We  impose  a  radius  constraint  rmax 

on  the  HJ-GVD  to  prevent  unrealistically  large  cell  domain  estimates.  Figure  2C  shows  sample  results  for 
the  HER2  example  using  rmax  =12  pixels.  The  estimated  cell  boundaries  are  overlaid  on  the  Euclidean 
distance  map  D(x,y).  Although  these  geometric  estimates  do  not  reflect  the  cellular  reality  (the  structures 
are  unobservable),  they  are  helpful  for  approximately  associating  extranuclear  markers  to  cells  when  the 
limitations  of  immunostaining  do  not  permit  additional  labels  for  cytoplasmic  &  membrane  markers. 

3.  Morphological  Measurements  of  Cells :  From  the  nuclear  and  cytoplasmic  segmentations,  we  compute 
cell  features  including  locations,  areas,  shape  factors,  boundary  curvatures,  convexity,  eccentricity,  radius 
variation,  orientation,  and  various  texture  measures  (average  intensity,  intensity  variation,  skew  of  intensity 
distribution,  energy  of  intensity  distribution,  entropy  of  intensity  distribution,  interior  gradient,  and  ratios  of 
intensity  values  (e.g.,  max/min))  (45).  Not  all  features  are  needed  for  a  given  analysis,  and  the  user  can 
choose  an  appropriate  subset.  The  cytoplasmic  segmentation  step  produces  one  cytoplasmic  domain  per 
segmented  nucleus,  so  the  nuclear  identifiers  (IDs)  are  used  for  tabulating  nuclear  and  cytoplasmic 
measurements. 
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4.  Biomarker  Measurements  of  Cells  Next,  molecular  biomarkers  are  quantified  by  measuring  their 
distribution  over  cellular  regions  of  interest  (masks)  defined  by  segmentation.  Figure  2E  shows  a  close-up 
view  of  these  regions  for  an  individual  cell.  The  red  outline  shows  the  intranuclear  compartment,  the  light 
blue  contours  delineate  the  intra-cytoplasmic  compartment,  and  the  orange  contour  runs  parallel  to  the  cell 
membrane  outline  (blue)  separated  by  a  fixed  distance  (5  pixels). 

Quantifying  Nuclear  Biomarkers:  Directly  summing  the  analyte  signal  over  intranuclear  compartments 
is  naive  since  it  does  not  correct  for  background  fluorescence.  Even  when  they  appear  dim,  background 
pixels  can  add  up  to  a  significant  sum  over  a  region.  To  address  this  problem,  we  first  perform  an  automatic 
2  or  3-level  segmentation  of  the  analyte  channel(46).  When  the  contrast  between  the  analyte-positive  pixels 
and  analyte-negative  pixels  is  high,  a  2-level  binarization  separates  the  bright  foreground  from  definite 
background  pixels.  When  the  analyte  exhibits  an  intermediate  background,  a  3-level  binarization  (e.g.. 
Figure  4)  segregates  pixels  into  bright  foreground,  intermediate  background,  and  dark  background.  Only  the 
bright  foreground  pixels  are  used  for  analyte  association.  Supplementary  Figure  3  illustrates  these  steps  for 
quantifying  ER  in  a  breast  cancer  specimen.  Panel  D  shows  the  3-level  binarization  for  background 
correction. 

Quantifying  Cytoplasmic  Markers:  Integration  of  markers  over  the  cytoplasmic  region  proceeds  as  with 
nuclei  -  the  background-corrected  analyte  signal  is  integrated  over  the  cytoplasmic  region  of  interest.  In 
Figure  2E,  the  cytoplasmic  region  of  integration  is  enclosed  by  the  blue  outlines,  but  excluding  the  intra¬ 
nuclear  region. 

Quantifying  Plasma  Membrane  Bound  Markers:  This  computation  must  cope  with  the  possibility  of 
an  unreliable  membrane  label  that  does  not  clearly  and  completely  define  the  cytoplasmic  domain  of  each 
cell.  Happily,  our  cytoplasmic  segmentation  is  designed  to  produce  closed  contours  representing  the  best- 
possible  estimates  of  cell  membrane  locations  based  on  available  cues.  When  a  user  determines  that  the 
membrane  signal  is  sufficiently  reliable,  membrane -bound  analytes  can  be  integrated  within  a  narrow  strip 
(typ.  5  pixels  wide)  of  the  segmented  membrane.  When  the  locations  of  cytoplasm  and  plasma  membrane 
markers  are  superimposed  or  extensively  overlap,  the  integration  is  carried  out  over  the  entire  cell  domain, 
with  background  correction.  The  resulting  biomarker  measurements  must  be  interpreted  with  care,  since  our 
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images  represent  planar  projections  of  sub-cellular  compartments  with  finite  thickness.  When  assigning 
analyte  expression  to  sub-cellular  compartments,  one  must  acknowledge  that  these  two  compartments  cannot 
be  perfectly  distinguished  or  separated  in  the  images  being  analyzed.  Nevertheless,  these  measurements  are 
adequate  from  the  standpoint  of  labeling  cells  as  being  positive/negative  for  membrane  bound  antigens,  and 
for  statistical  analysis. 

5.  Cell  Type  Identification:  This  step  identifies  whether  a  cell  is  of  a  specified  type  based  on  its 
morphological  and  associative  features.  We  use  a  supervised  approach,  where  a  training  set  (containing 
examples  of  both  classes  from  one  or  more  images)  is  indicated  by  the  user,  from  which  a  Bayesian  classifier 
is  constructed.  Figure  2D  illustrates  cell  classification  results  for  the  example  image  shown  in  Figure  1, 
based  on  the  cytokeratin  signal.  Yellow  dots  represent  cells  that  are  CK+  and  HER2+,  and  white  dots 
represent  other  cells. 

EXPERIMENTAL  RESULTS 

FARSIGHT  (www.farsight-toolkit.org)  was  written  using  standard  software  tools  (C++,  ITK,  VTK,  QT) 
and  allows  a  user  to  perform  automated  segmentation,  view  &  edit  the  results,  compute  morphological  & 
associative  features,  classify  cells,  and  export  the  results  to  spreadsheets.  It  is  both  free  and  open  source. 
Each  row  of  the  output  corresponds  to  one  numbered  cell  in  the  image.  The  software  was  validated  in  two 
ways.  First,  its  results  were  compared  against  determinations  made  by  a  human  expert.  Another  validation 
was  based  on  in  vitro  cultured  cells,  labeled  with  different  fl uoroehromes  and  mixed  in  different  ratios  to 
create  cell  blocks  from  which  slides  were  cut  for  fluorescence  imaging  and  analysis.  Specifically,  cultured 
cells  were  labeled  with  the  membrane  dye,  PKH26,  or  with  a  combination  of  PKH26  and  PKH67.  The 
PKH26  cells  and  PKH26/PKH67  cells  were  mixed  in  different  ratios  (10:0,  9:1,  2:1,  1:1,  1:2,  1:9  and  0:10), 
fixed  and  frozen  in  OCT  embedding  media.  Slides  cut  from  these  cell  blocks  were  stained  with  DAPI  to 
reveal  nuclei  and  membrane  proteins  PKH26  and  PKH67.  The  details  of  the  protocols  and  results  are  in 
Supplement  A.  Ten  images  (400X)  were  taken  of  slides  from  each  block  and  processed  by  FARSIGHT  to 
segment  cells,  classify  them  as  PKH67-negative  or  PKH67-positive,  and  compute  the  ratio  of  the  two  cell 
populations.  The  results  were  in  concordance  with  a  human  expert  scoring  (Table  A.l).  The  averages  of  cell 


Page  10 


proportions  determined  by  FARSIGHT  closely  approximated  the  known  truth  (Supplementary  Figure  A.l). 
We  then  proceeded  to  evaluate  FARSIGHT  for  human  breast  histopathology  samples. 

Cell  membrane  Analyte  (HER2):  Figure  2  shows  our  analysis  of  the  image  in  Figure  1.  The  histogram 
in  Figure  2F  shows  the  distribution  of  HER2  in  the  cells,  he  cut-off  value  was  12.6  grayscale  units,  at  which 
98.5%  of  the  tumor  cells  (CK+  cells)  are  HER2+.  These  data  concord  with  an  expert  human  reading  of  99%. 
In  some  cases,  HER2  staining  is  not  also  usable  for  cell  boundary  determination  (e.g.  HER2  staining  overlays 
cell  nuclei  or  is  extremely  dark  and  thick  (Supplementary  Figure  1),  so  the  cell  boundaries  were  estimated 
geometrically  (Mode  1). 

Nuclear  Analytes  (ER,  PR,  Ki67):  We  applied  our  methodology  to  specimens  stained  for  3  common 
nuclear-bound  markers,  ER,  PR  and  Ki67.  Supplementary  Figure  2  shows  the  detailed  steps  for  the  ER  case 
-  the  steps  were  identical  for  PR  and  Ki67.  Figure  3  shows  the  results  for  breast  cancer  specimens  stained 
for  ER  (A,  B),  PR  (C,  D),  and  Ki76  (E,  F),  respectively.  As  a  crosscheck,  we  computed  the  ratios  of  nuclear 
to  cytoplasmic  levels  of  the  analytes  for  every  cell.  Histograms  of  these  ratios  (panels  B,  D,  F)  show  that 
these  analytes  are  strongly  nuclear  bound,  as  expected  for  antigens  that  are  located  in  nuclei.  The 
automatically  determined  percentages  of  ER+,  PR+  and  Ki67+  cells  were  39%,  40%  and  27%  of  the  CK+ 
cells,  compared  to  expert  determined  percentages  of  38%,  39%  and  26%,  respectively.  For  comparison, 
pixel-level  analysis  to  determine  the  percentage  of  hematoxylin+  pixels  (the  image  area  occupied  by  nuclei) 
that  were  also  ER+,  PR+  or  Ki67+  yielded  17.3%,  28.5%  and  14.5%,  respectively.  Clearly,  area 
measurements  do  not  reflect  cell  numbers. 

Figure  4  illustrates  analysis  of  chosen  sub-populations  of  cells.  To  measure  cell  proliferation  and  its 
relationship  to  activity  of  the  Raf-MEK-ERK  signaling  pathway,  a  human  breast  carcinoma  was 
immunostained  for  Ki67,  p-ERK  and  CK.  CK  staining  reveals  a  cluster  of  carcinoma  cells  to  the  right,  but 
these  constitute  a  minority  of  the  cells;  the  majority  are  lymphocytes  within  a  reactive  lymphoid  nodule. 
Ki67  immunostaining  showed  that  34.6%  of  all  cells  were  proliferating.  For  comparison,  pixel-level  analysis 
showed  that  16.7%  of  hematoxylin+  pixels  were  Ki67+.  Only  2.1%  (2  of  96)  of  carcinoma  cells  are  Ki67+, 
while  37.8%  (414  of  1094  stromal  cells)  are  Ki67+.  Thus  the  total  number  or  percentage  of  Ki67+  cells  does 
not  accurately  report  tumour  cell  proliferative  activity.  It  demonstrates  that  a  cell-based  method  with  the 


Page  11 


ability  to  type  cells  as  tumour  or  stromal  prior  to  analyte  quantification  is  important  for  characterizing  human 
tumours,  where  the  cellular  composition  is  always  heterogeneous,  and  tumour  cells  may  not  predominate. 
Further  analysis  to  examine  the  correlation  between  Raf-MEK-ERK  signaling  and  proliferation  showed  a 
high  coefficient  (R  =  0.89)  between  p-ERK  and  Ki67  expression  in  cells  (Figure  4E).  It  suggests  that  ERK 
activation  and  proliferation  may  be  linked  events  among  the  cells  in  this  image.  This  is  expected,  since  the 
majority  of  proliferating  cells  are  lymphocytes,  and  ERK  activation  has  been  shown  to  accompany  mitogenic 
activation  of  lymphocytes  in  vitro  (47).  Due  to  the  low  frequency  of  Ki67  and  p-ERK  positivity  among  CK+ 
cells  in  this  image,  little  can  be  learned  about  concurrence  of  ERK  activation  and  proliferation  in  carcinoma 
cells  from  this  image  (Figure  4F). 

To  examine  the  relationship  between  ERK  activation  and  proliferation  in  breast  cancer  cells,  another 
region  of  the  same  tumour  (Figure  5A-C)  and  a  region  of  a  second,  similarly  stained  tumour  (Figure  5G-I) 
were  analyzed.  In  both  fields,  tumor  cells  are  the  majority,  and  a  significant  fraction  are  Ki67+  (10%  for 
tumour  1,  7.5%  for  tumour  2).  Scatter  plots  of  p-ERK  and  Ki67  expression  in  individual  cells  reveal  that  the 
correlation  between  p-ERK  and  Ki67  staining  is  less  among  the  CK+  carcinoma  cells  of  tumour  1  (R  =  0.59) 
and  tumour  2  (R  =  0.29)  than  among  the  reactive  lymphocytes  in  tumour  1  (Figure  4,  R=  0.89).  Based  on 
these  images,  the  link  between  ERK  activation  and  cell  proliferation  appears  weaker  in  the  tumour  cells  than 
in  the  reactive  lymphocytes,  illustrating  the  utility  of  specific  cell-level  analysis  as  a  research  tool. 

The  ability  of  our  method  to  separate  each  cell  into  nuclear  and  extranuclear  compartments  is  valuable. 
Figure  6  shows  a  breast  tumour  that  was  stained  with  antibodies  to  p-S6  (the  activated  form  of  ribosomal 
protein  S6),  CK  and  EMA,  all  by  immunofluorescence,  and  counterstained  with  hematoxylin.  Figure  6D 
shows  cell  segmentation  and  classification  results  with  yellow  contours  outlining  the  cytoplasmic  boundaries 
of  CK+  cells  determined  using  the  CK  and  EMA  channels  jointly.  The  sub-population  of  CK+  cells  that  are 
p-S6+  is  in  the  minority  (11%)  in  this  tumour  (for  comparison,  pixel  based  analysis  showed  that  8.9%  of 
CK+  pixels  are  p-S6+).  Visual  examination  of  the  p-S6+  cells  shows  that  p-S6  staining,  as  expected,  was 
predominantly  cytoplasmic.  This  was  confirmed  by  plotting  a  histogram  of  the  extra-nuclear  to  nuclear  ratio 
of  p-S6  signal  in  cells  that  expressed  this  antigen  (Figure  6F),  which  showed  that  only  10%  of  p-S6  signal 
was  nuclear.  This  small  amount  of  “nuclear”  p-S6  may  be  explained  by  the  fact  that  the  image  represents  a 
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planar  projection  of  a  tumour  section  that  is  5pm  thick;  p-S6  staining  in  cell  cytoplasm  situated  above  or 
below  nuclei  in  these  sections  would  register  as  nuclear. 

DISCUSSION  &  CONCLUSIONS 

The  “histocytometric”  analyses  performed  by  FARSIGHT  on  the  images  shown  demonstrate  the 
practicality  and  value  of  quantifying  molecular  analytes  on  a  cellular  scale  with  cell-type  and  sub-cellular 
compartment  specificity.  Although  these  studies  focused  on  breast  cancer,  our  methodology  and  tools  are 
applicable  to  other  cancers  and  conditions.  Our  approach  requires  more  extensive  immunostaining  and 
sophisticated  imaging  compared  to  traditional  visual  histopathology,  but  offers  important  benefits.  It  reveals 
the  type,  distribution,  intrinsic  characteristics  and  biomarker  state  of  each  cell  in  its  tissue  context.  It  allows 
multiple  biomarkers  to  be  quantified  selectively  over  specified  cell-types,  regardless  of  their  abundance.  Our 
efforts  were  focused  on  quantifying  analytes  in  tumor  cells,  but  stromal  cells  (endothelial  cells,  fibroblasts, 
lymphocytes,  macrophages,  etc.)  are  omnipresent  in  tumors  and  gaining  attention  for  their  contributions  to 
malignant  progression  and  behavior  (48)  (49).  The  ability  of  histocytometry  to  specify  the  cell-type  for 
analysis  makes  it  a  sensitive  and  specific  tool  for  investigating  minority  stromal  cell  subpopulations,  whose 
attributes  would  otherwise  be  overshadowed  by  more  abundant  cell  types. 

Our  cell-based  method  shares  some  advantages  with  pixel-level  analysis,  such  as  objectivity, 
reproducibility,  and  ability  to  quantify  on  a  continuous  scale.  However,  by  using  the  cell  as  the  unit  of 
analysis,  it  generates  additional  and  potentially  complementary  measurements  expressible  in  terms  of  cell 
counts  and  cell  types.  Such  measurements  are  unaffected  by  the  area  occupied  by  cells  and  other  tissue 
structures  in  the  image.  While  the  two  types  of  measurements  can  be  correlated  for  some  samples,  they  can 
differ  greatly  for  others,  as  shown  by  our  examples.  For  analysis  of  histopathology  specimens,  both  methods 
are  usable  diagnostically,  but  we  believe  that  event  reporting  by  cell  number  or  percentage  is  biologically 
more  informative,  as  reflected  in  the  fact  that  it  is  the  preferred  form  of  reporting  for  many  in  vitro  cellular 
studies.  Our  software  system  makes  it  possible  to  generate  these  reports. 

Histocytometry  correctly  assigns  analytes  to  appropriate  subcellular  locations  within  one  cell  (e.g.  a 
nuclear  analyte  and  a  cytoplasmic  analyte)  to  the  same  unit.  Results  so  organized  have  obvious  benefits, 
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particularly  when  interrogating  tissues  for  biological  processes  and  events  that  occur,  or  are  regulated,  at  the 
level  of  individual  cells  but  involve  different  subcellular  compartments.  This  feature  of  FARSIGHT  analysis 
also  brings  the  ability  to  examine  and  quantify  analytes  in  different  compartments  of  cells.  This  is  an 
advantage  when  studying  analytes  whose  subcellular  location,  by  itself,  is  informative  about  activity.  For 
example,  the  transcription  factor,  NFkB,  is  kept  transcriptionally  inactive  when  it  is  constrained  in  the 
cytoplasm  through  binding  to  its  inhibitor,  IkB.  NFkB  becomes  active  upon  its  translocation  to  the  nucleus 
following  stimuli  that  induce  release  from  and  degradation  of  IkB  (50).  An  extension  of  this  is  the  study  of 
yet  other  analytes  that  produce  different  effects,  depending  on  whether  they  are  localized  to  the  cytoplasm  or 
nucleus.  Finally,  by  providing  analyte  data  for  each  cell  in  an  image  rather  than  one  result  for  the  image  as  a 
whole,  FARSIGHT  analysis  can  reveal  population  characteristics,  such  as  analyte  range,  distribution,  and 
variance  among  cells  that  can  be  additionally  informative.  Histocytometry  can  provide  information  similar  to 
that  provided  by  flow  cytometry  with  the  added  benefit  of  preserving  tissue  architecture,  which  allows 
concurrent  examination  of  morphological  features  and  quantification  of  spatial  relationships  and  distributions 
not  possible  with  the  dissociated  cells  used  for  flow  cytometry. 

We  developed  our  multiplex  immunostaining  protocols  for  the  study  of  formalin-fixed,  paraffin- 
embedded  (FFPE)  histopathology  specimens.  This  allows  histocytometric  analysis  to  be  performed  on  the 
tissue  material  most  commonly  available  from  cancer  patients  and  most  often  stored  in  pathology  archives. 
However,  frozen  and  other  forms  of  preserved  tissues  are  also  suitable  for  this  type  of  analysis;  their  study 
only  requires  development  of  appropriate  immunostaining  protocols.  These  protocols  have  involved 
immunostaining  for  four  or  more  antigens  on  the  same  slide  to  study  a  single  analyte.  This  level  of 
complexity  stems  from  the  need  to  stain  for  cell  type,  subcellular  compartments,  and  analyte  antigens  on  the 
same  slide.  Some  of  this  complexity  may  be  reduced  by  algorithms  for  direct  multi-spectral  identification  of 
tumor  cells  and  tumor  areas  in  slides  stained  only  with  hematoxylin  and  eosin  (H&E).  For  tumor  cell 
analysis,  computer  generated  “tumor  masks”  may  eliminate  the  need  to  immunostain  for  cell  type  and 
compartment  antigens.  Combining  use  of  tumor  masks  with  cell  segmentation  based  on  geometric 
algorithms,  histocytometry  analysis  may  be  performed  on  slides  stained  only  for  analyte  and  H&E,  such  as 
breast  cancer  specimens  stained  for  ER,  PR  and  HER2  in  hospital  pathology  laboratories.  While  the  utility 
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of  developing  methods  for  histocytometric  analysis  of  simply  stained  slides  is  primarily  clinical,  expanding 


the  current  limits  of  immunostain  multiplexing  will  make  histocytometry  an  even  more  potent  instrument  for 


biology  research.  FARSIGHT  can  also  be  applied  to  H&E  stained  sections,  but  the  caveat  rests  with  the 


fluorescence  of  eosin  that  must  be  properly  accounted  for  in  the  spectral  unmixing.  It  will  allow  study  of 
numerous  analytes  on  the  same  slide.  Accompanied  by  FARSIGHT  cell-based  quantification  of  their 
expression,  this  will  enable  examination  of  complex  patterns  of  signaling  pathway  activity  and  other 
molecular  events  in  cells  in  authentic  tissue  context.  Although  our  examples  did  not  show  analysis  of 


multiple  cell  types,  the  system  itself  is  capable  of  such  analysis  and  we  expect  to  report  validation  of  this 


capability  in  subsequent  papers.  As  part  of  our  effort  to  hasten  development  and  advancement  of  this 


histopathology  analysis  platform,  FARSIGHT  has  been  made  available  as  a  free  &  open  source  software 
system  (www.farsight-toolkit.org).  In  the  future,  we  expect  this  system  to  be  adapted  to  automated  analysis 
of  larger  batches  of  specimens,  that  may  be  multiplex  stained  by  automated  systems,  and  whole-slide 

scanning. 
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FIGURE  LEGENDS 


Figure  1:  Multiplex  stained  human  breast  cancer  specimen.  A  human  breast  cancer  was  stained  for  HER2 
by  immunofluorescence  using  Texas  red  and  for  cytokeratin  by  immunofluorescence  using  Alexa-488  and 
counterstained  with  hematoxylin.  The  slide  was  imaged  multi-spectrally  in  absorption  and  fluorescence 
modes,  and  the  results  unmixed  to  yield  non-overlapping  channels.  (A)  Brightfield  image  showing 
hematoxylin  staining.  (B)  Unmixed  channel  containing  only  cell  nuclei,  corresponding  to  the  hematoxylin 
spectral  signature.  (C)  Unmixed  channel  for  fluorescently-stained  cytokeratin.  (D)  Unmixed  channel 
corresponding  to  fluorescently  stained  HER2.  (E)  Composite  3-color  image  with  nuclei  (red),  cytokeratin 
(green),  and  HER2  (blue).  (F)  Spectral  signatures  used  for  the  unmixing  computations,  displayed  using  blue 
for  hematoxylin  (nuclei),  green  for  Alexa-488  (cytokeratin),  and  red  for  Texas  Red  (HER2). 

Figure  2:  Automated  image  analysis  steps  for  the  specimen  in  Figure  1.  (A)  Automatic  nuclear 
segmentation  (red  outlines)  of  the  nuclear  channel.  (B)  Estimated  cytoplasmic  domains  for  cytokeratin+  cells 
for  the  boxed  region  in  panel  D  overlaid  on  the  gradient  enhanced  distance  map  (Mode  0).  (C)  Geometrically 
estimated  cytoplasmic  domains  for  stromal  cells  in  the  same  region  overlaid  on  the  underlying  dominance 
map  (Mode  1).  (D)  Composite  cell  segmentation  &  classification  results,  with  yellow  dots  indicating  cells 
that  are  cytokeratin+  and  HER2+,  and  white  dots  indicating  other  cells.  (E)  Close-up  illustrating  regions  of 
interest  used  to  quantify  HER2.  (F)  Histogram  summary  showing  the  cutoff  point  for  declaring  cells  HER2+. 

Figure  3:  Examples  showing  analysis  of  breast  cancer  specimens  stained  for  three  nuclear-bound 
biomarkers.  Breast  cancer  slides  were  immunostained  for  estrogen  receptor  (A,  B),  progesterone  receptor 
(C,D),  or  Ki67  (E,  F)  plus  cytokeratin  and  counterstained  with  hematoxylin.  Images  were  captured,  and 
nuclear  and  whole  cell  segmentation  was  performed,  with  yellow  dots  indicating  the  nuclei  positive  for  the 
respective  analytes  (A,  C,  E).  Analyte  was  quantified  in  the  nuclear  and  extranuclear  compartments  of  each 
cell,  and  histograms  of  the  ratios  of  nuclear  to  extranuclear  analyte  levels  in  all  positive  cells  are  shown  (B, 

D,  F). 
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Figure  4:  Duplex  analysis  of  pERK  and  Ki67  immunostaining  in  lymphoid  cells  in  a  human  breast 
carcinoma.  A  section  of  a  breast  tumor  was  stained  sequentially  with  anti-pERK  (SG  blue),  anti-Ki67 
(DAB)  and  anti-CK  (Alexa-488)  antibodies,  followed  by  hematoxylin  staining,  multispectral  imaging  (400X) 
and  cytometric  analysis.  The  brightfield  image  of  a  lymphoid  nodule  in  the  tumor  is  shown  (A)  along  with 
the  unmixed  channels  for  DAB  (Ki67)  (B),  SG  blue  (p-ERK)  (C)  and  Alexa-488  (cytokeratin)  (D).  Scatter 
plots  of  p-ERK  (X-axis)  and  Ki67  (Y-axis)  staining  intensity  are  shown  for  cells  in  the  lymphoid  nodule  (E) 
and  for  tumor  cells  (F),  with  each  dot  representing  one  cell. 

Figure  5:  Duplex  analysis  of  pERK  and  Ki67  immunostaining  in  human  breast  carcinoma  cells. 

Sections  of  two  different  breast  tumors  were  stained  and  analyzed  as  described  for  Figure  5.  Brightfield 
images  of  the  two  different  tumors  are  shown  (A,  G),  with  the  unmixed  channels  for  DAB  (Ki67)  (B,H)  and 
SG  blue  (p-ERK)  (C,I).  Composite  images  showing  whole  cell  segmentation  of  the  tumor  (cytokeratin+) 
cells  are  shown  (D,J).  Scatter  plots  of  p-ERK  (X-axis)  and  Ki67  (Y-axis)  staining  intensity  are  shown  for 
tumor  cells  (E.K)  and  for  non-tumor  (stromal)  cells  in  (F,L)  with  each  dot  representing  one  cell. 

Figure  6:  Analysis  of  phospho-S6  immunostaining  in  a  human  breast  cancer.  A  section  of  a  breast  tumor 
was  stained  with  anti-p-S6  (Alexa-488),  anti-EMA  (Alexa-594)  and  anti-CK  (Alexa-555),  followed  by 
hematoxylin  staining,  multispectral  imaging  (400X)  and  cytometric  analysis.  The  brightfield  image  is  shown 
(A)  along  with  unmixed  channels  for  Alexa-555  (CK)  (B)  and  Alexa-594  (EMA)  (C).  Composite  images  of 
p-S6  analyte  staining  along  with  segmented  whole  tumor  cells  are  shown  (D;  E  shows  an  enlargement  of  the 
boxed  area  in  D).  In  each  cell,  analyte  in  the  nuclear  and  extranuclear  compartment  was  quantified.  Ratios  of 
extranuclear  to  nuclear  analyte  were  calculated  for  each  positive  tumor  cell  and  their  distribution  is  shown 
(F). 


Page  20 


Supplementary  Figure  1:  Illustrating  application  of  the  proposed  methods  to  a  breast  cancer  specimen 
labeled  for  HER2  and  cell  nuclei  only.  The  specimen  was  stained  for  HER2  by  IHC  using  DAB  and 
counterstained  with  hematoxylin.  (A)  Brightfield  image  of  slide  showing  HER2  staining  in  DAB  (brown 
chromogen)  counterstained  with  hematoxylin  (B)  Composite  2-color  image  with  red  corresponding  to  the 
nuclear  channel,  and  light  blue  corresponding  to  the  HER2  channel  (C)  Combined  segmentation  and 
classification  results  overlaid  on  the  composite  image  shown  in  panel  B.  The  bright  blue  outlines  indicate  cell 
boundaries  that  were  estimated  using  a  fixed  distance  of  10  pixels  surrounding  cell  nuclei,  using  the  method 
described  as  Case  III.  Yellow  and  pink  dots  represent  HER2+  and  HER2-  cells  respectively. 

Supplementary  Figure  2:  Application  of  the  proposed  method  to  a  breast  cancer  specimen  labeled  for 
Estrogen  Receptor  (ER)  by  IHC  using  DAB,  and  for  CK  by  IF  using  Alexa-488  and  counterstained  with 
hematoxylin.  (A)  Brightfield  image.  (B)  Composite  3-color  image  after  spectral  unmixing  with  red,  green, 
and  blue  corresponding  to  the  nuclear,  cytokeratin  and  ER  channels  respectively.  (C)  The  raw  ER  channel. 
(D)  Three-level  binarization  of  the  ER  channel  for  background  correction.  (E)  Combined  segmentation  and 
classification  results  overlaid  on  the  composite  image  in  panel  B.  Yellow  and  pink  dots  indicate  ER+  and 
ER-  cells.  (F)  Histogram  of  the  background-corrected  intra-nuclear  ER  signal  in  cell  nuclei.  The  ER  density 
cut-off  value  between  ER-  and  ER+  is  55.2  and  39%  of  the  tumor  cells  (CK+)  are  identified  as  ER+.  This 
percentage  is  very  close  to  the  manual  estimate,  which  is  38%. 
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Figure  1:  Multiplex  stained  human  breast  cancer  specimen.  A  human  breast  cancer  was  stained  for  HER2 
by  immunofluorescence  using  Texas  red  and  for  cyt  okeratin  by  im  munofluorescence  using  Alexa  -488  and 
counterstained  with  hem  atoxylin.  The  slide  was  im  aged  m  ulti-spectrally  in  absorption  and  fluorescence 
modes,  and  the  results  unm  ixed  to  yield  non  -overlapping  channels.  (A)  Br  ightfield  im  age  showing 
hematoxylin  staining.  (B)  Unm  ixed  channel  containi  ng  only  cell  nuclei,  corresponding  to  the  hem  atoxylin 
spectral  signature.  (C)  Unm  ixed  channel  for  fluores  cently- stained  cytokeratin.  (D  )  U  nmixed  channel 
corresponding  to  fluorescently  stained  HER2.  (E)  Com  posite  3 -color  im  age  with  nuclei  (red),  cytokeratin 
(green),  and  HER2  (blue).  (F)  Spectral  signatures  used  for  the  unmixing  computations,  displayed  using  blue 
for  hematoxylin  (nuclei),  green  for  Alexa-488  (cytokeratin),  and  red  for  Texas  Red  (HER2). 
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Figure  2:  Automated  image  analysis  steps  for  the  specimen  in  Figure  1.  (A  )  A  utomatic  nuclear 
segmentation  (red  outlines)  of  the  nuclear  channel.  (B )  Estimated  cytoplasmic  domains  for  cytokeratin+  cells 
for  the  boxed  region  in  panel  D  overlaid  on  the  gradient  enhanced  distance  map  (Mode  0).  (C)  Geometrically 
estimated  cytoplasmic  domains  for  strom  al  cells  in  the  sam  e  region  overlaid  on  the  underlying  dom  inance 
map  (Mode  1).  (D)  Com  posite  cell  segm  entation  &  classification  results,  w  ith  yellow  dots  indicating  cells 
that  are  cytokeratin+  and  H  ER2+,  and  white  dots  indicating  other  cells .  (E)  Close-up  illustrating  regions  of 
interest  used  to  quantify  HER2.  (F)  Histogram  summary  showing  the  cutoff  point  for  declaring  cells  HER2+. 
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Figure  3:  Examples  showing  analysis  of  breast  cancer  specimens  stained  for  three  nuclear-bound 
biomarkers.  Breast  cancer  slides  w  ere  immunostained  for  estrogen  receptor  (A  ,  B),  progesterone  receptor 
(C,D),  or  Ki67  (E,  F)  plus  cytokeratin  and  counterst  ained  with  hem  atoxylin.  Im  ages  were  captured,  and 
nuclear  and  whole  cell  segmentation  was  performed,  with  yellow  dots  indicating  the  nuclei  positive  for  the 
respective  analytes  (A,  C,  E).  Analyte  was  quantified  in  the  nuclear  and  extranuclear  com  partments  of  each 
cell,  and  histograms  of  the  ratios  of  nuclear  to  extranuc  lear  analyte  levels  in  all  positive  cells  are  show  n  (B, 
D,  F). 
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carcinoma.  A  section  of  a  breast  turn  or  was  stained  seque  ntially  w  ith  anti-pE  RK  (SG  blue),  anti-K  i67 
(DAB)  and  anti-CK  (Alexa-488)  antibodies,  followed  by  hematoxylin  staining,  multispectral  imaging  (400X) 
and  cytometric  analysis.  The  brightfield  im  age  of  a  lymphoid  nodule  in  the  turn  or  is  shown  (A)  along  with 
the  unmixed  channels  for  DAB  (Ki67)  (B),  SG  blue  (p-ERK  )  (C)  and  Alexa-488  (cytokeratin)  (D).  Scatter 
plots  of  p-ERK  (X-axis)  and  Ki67  (Y-axis)  staining  intensity  are  shown  for  cells  in  the  lymphoid  nodule  (E) 
and  for  tumor  cells  (F),  with  each  dot  representing  one  cell. 
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Figure  5:  Duplex  analysis  of  pERK  and  Ki67  immunostaining  in  human  breast  carcinoma  cells. 

Sections  of  two  different  breast  tumors  were  stained  and  analyzed  as  described  for  Figure  5.  Brightfield 
images  of  the  two  different  tumors  are  shown  (A,  G),  with  the  unmixed  channels  for  DAB  (Ki67)  (B,H)  and 


SG  blue  (p-ERK)  (C,I).  Composite  images  showing  whole  cell  segmentation  of  the  tumor  (cytokeratin+) 
cells  are  shown  (D,J).  Scatter  plots  of  p-ERK  (X-axis)  and  Ki67  (Y-axis)  staining  intensity  are  shown  for 
tumor  cells  (E.K)  and  for  non-tumor  (stromal)  cells  in  (F,L)  with  each  dot  representing  one  cell.. 
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Figure  6:  Analysis  of  phospho-S6  immunostaining  in  a  human  breast  cancer.  A  section  of  a  breast  tumor 
was  stained  with  anti-p-S6  (Alexa-488),  anti-EMA  (Alexa-594)  and  anti-CK  (Alexa-555),  followed  by 
hematoxylin  staining,  multispectral  imaging  (400X)  and  cytometric  analysis.  The  brightfield  image  is  shown 
(A)  along  with  unmixed  channels  for  Alexa-555  (CK)  (B)  and  Alexa-594  (EMA)  (C).  Composite  images  of 
p-S6  analyte  staining  along  with  segmented  whole  tumor  cells  are  shown  (D;  E  shows  an  enlargement  of  the 
boxed  area  in  D).  In  each  cell,  analyte  in  the  nuclear  and  extranuclear  compartment  was  quantified.  Ratios  of 
extranuclear  to  nuclear  analyte  were  calculated  for  each  positive  tumor  cell  and  their  distribution  is  shown 
(F). 
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Supplementary  Figure  1:  Illustrating  application  of  the  proposed  methods  to  a  breast  cancer  specimen 
labeled  for  HER2  and  cell  nuclei  only.  The  specim  en  was  stained  for  HER2  by  IHC  using  DAB  and 
counterstained  with  hem  atoxylin.  (A )  Brightfield  im  age  of  slide  show  ing  HER2  staining  in  DAB  (brown 
chromogen)  counterstained  with  hem  atoxylin  (B)  Co  mposite  2-color  im  age  with  red  corresponding  to  the 
nuclear  channel,  and  light  blue  corresponding  to  the  HER2  channel  (C)  Com  bined  segm  entation  and 
classification  results  overlaid  on  the  composite  image  shown  in  panel  B.  The  bright  blue  outlines  indicate  cell 
boundaries  that  were  estimated  using  a  fixed  distance  of  10  pixels  surrounding  cell  nuclei,  using  the  m  ethod 
described  as  Case  III.  Yellow  and  pink  dots  represent  HER2+  and  HER2-  cells  respectively. 
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Supplementary  Figure  2:  Application  of  the  proposed  m  ethod  to  a  breast  cancer  specim  en  labeled  for 
Estrogen  Receptor  (ER)  by  IH  C  using  D  AB,  and  for  CK  by  IF  using  A  lexa-488  and  counterstained  w  ith 
hematoxylin.  (A)  Brightfield  im  age.  (B)  Com  posite  3-color  image  after  spectral  unm  ixing  with  red,  green, 
and  blue  corresponding  to  the  nuclear,  cytokeratin  and  ER  channels  respectively.  (C)  The  raw  ER  channel. 
(D)  Three-level  binarization  of  the  ER  channel  for  background  correction.  (E)  Combined  segmentation  and 
classification  results  overlaid  on  the  com  posite  image  in  panel  B  .  Yellow  and  pink  dots  indicate  ER+  and 
ER-  cells.  (F)  Histogram  of  the  background-corrected  in  tra-nuclear  ER  signal  in  cell  nuclei.  The  ER  density 
cut-off  value  between  ER-  and  ER+  is  55.2  and  39%  of  the  tumor  cells  (C  K+)  are  identified  as  E  R+.  This 
percentage  is  very  close  to  the  manual  estimate,  which  is  38%. 
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SUPPLEMENT  A:  DETAILS  OF  IN  VITRO  VALIDATION  EXPERIMENT 


In  vitro  specimens  used  for  validation  purposes:  We  specifically  prepared  a  set  of  slides  for  the  purpose 
of  validating  the  performance  of  our  integrated  methodology.  For  this,  K1735  murine  melanoma  tumor  cells 
were  cultured  in  com  plete  m  edia  D  MEM  w  ith  10%  FBS  and  10%  penicillin /strep tom  ycin.  A  fter 
trypsinization,  half  of  the  cells  w  ere  labeled  with  plasma  membrane  dye  PKH26  while  the  other  half  of  the 
cells  were  labeled  with  PKH26  and  PKH67  (Sigma  Aldrich,  Allentown,  PA)  according  to  instructions.  Cells 
from  the  two  labeling  reactions  were  washed,  counted  and  mixed  together  in  different  ratios:  0%,  10%,  33  %, 
50%,  66%,  90%  and  100%.  Cells  in  the  different  mixtures  were  fixed  in  2%  paraformaldehyde  for  2  minutes 
and  centrifuged.  The  cell  pellets  were  snap  frozen  in  liquid  nitrogen,  and  10  pM  sections  were  cut  from  the 
frozen  blocks.  Sections  were  stained  with  4’,  6’-diamidino-2-phenylindole  (DAPI)  to  reveal  nuclei. 

Stained  slides  were  im  aged  at  appropriate  wavele  ngths  to  reveal  nuclei,  PKH26  staining  and  PKH67 
staining  (Figure  A.l,  panel  A-E,  respectively).  Im  ages  of  the  pur  e  cases  (10:0,  and  0:10)  are  not  shown.  In 
these  images,  the  nuclear  channel  (D  API)  is  displayed  in  red,  the  PKH26  channel  is  displayed  in  blue,  and 
PKH67  channel  in  green.  Ten  im  ages  (400X  )  w  ere  taken  of  slides  from  each  block  and  processed  by 
FARSIGHT  to  segm  ent  cells  and  their  nuclei,  classi  fy  the  segm  ented  cells  as  PK  H67-negative  or  PK  H67- 
positive,  and  com  pute  the  ratio  of  the  tw  o  cell  populati  ons.  The  intermediate  image  analysis  steps  are  not 
shown,  but  only  the  final  cell  classification  results  are  displayed  by  color  coding  the  nuclear  segm  entation 
seeds.  PKH26+  cells  are  indicated  as  red  dots  and  the  PKH67+  cells  are  shown  as  yellow  dots.  A  plot  of  the 
measured  proportion  in  every  image  (Y-axis)  versus  the  true  proportion  of  PKH67  -negative  cells  (X-axis)  is 
shown  in  panel  F  and  dem  onstrates  that  the  averages  of  cell  proportions  determ  ined  by  FARSIGHT  closely 
approximate  the  known  truth.  A  com  parison  between  the  autom  atically  found  average  percentages  of 
positive  cells  to  the  corresponding  ground  truths  for  a  set  of  sam  pie  images  is  provided  in  Table  A.l.  In 
interpreting  these  data,  one  must  expect  some  natural  variability  from  one  image  to  the  next,  hence  the  reason 
for  analyzing  10  images  for  each  ratio. 
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Table  A.l:  Summary  of  validation  results  comparing  automated  classification  results  against  the 
results  produced  by  a  human  expert  for  several  of  the  examples  shown  in  this  paper. 


Image 

Percentage  of  positive  cells 

Manual 

Average  (%) 

Automated 
Average  (%) 

In_vitro  0  % 

0 

2.4 

In_vitro  10% 

10 

11.1 

In_vitro  33% 

33 

32.8 

In_vitro  50  % 

50 

53.7 

In_vitro  66  % 

66 

66.9 

In_vitro  90  % 

90 

86.4 

In_vitro  100% 

100 

98.0 

In_vivo  ER 

38 

39 

In_vivo  PR 

39 

40 

In_vivo  Ki67 

26 

27 

In_vivo  HER2 

99% 

98.5 
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Supplementary  Figure  A.l:  Validation  using  in  vitro  labeled  cultured  cell  blocks.  (A-E)  Sample  images 
of  slices  of  “blocks”  of  cultured  cells  labeled  in  vitro  with  PKH26,  or  with  a  com  bination  of  PKH26  and 
PKH67  mixed  in  ratios  (9:1,  2:1,  1:1,  1:2,  &  1:9).  Images  of  the  pure  cases  (10:0,  and  0:10)  are  not  shown. 
The  nuclear  channel  (DAPI)  is  displayed  in  red,  the  PKH26  channel  is  displayed  in  blue,  and  PKH67  channel 
in  green.  The  final  cell  classification  results  are  disp  layed  by  color-coding  the  nuclear  segm  entation  seeds. 
PKH26+  cells  are  indicated  as  red  dots  and  the  PKH67+  cells  are  shown  as  yellow  dots.  (F)  A  plot  of  the 
measured  proportion  (Y  -axis)  versus  the  true  proporti  on  of  PKH67-negative  cells  (X  -axis)  in  every  im  age 
demonstrates  that  the  averages  of  cell  proportions  de  termined  by  FARSIGHT  very  closely  approxim  ate  the 
known  truth,  and  the  absence  of  systematic  bias. 
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From:  "Badri  Roysam"  <roysam@ecse.rpi.edu> 

Date:  September  8,  2010  1:20:53  PM  EDT 

To:  leemingf@mail.med.upenn.edu,  yousef.kofahi@gmail.com,  gramak@rpi.edu,  wiem1las@yahoo.fr 

Subject:  Fw:  Histopathology  -  HISTOP-04-1 0-0247 


—  Original  Message  — 

From:  HISedoffice@wiley.com 
To:  roysam@ecse.rpi.edu,  michal@rpi.edu 
Sent:  Wed,  08  Sep  201 0  1 1 :54:52  -0400 
Subject:  Histopathology  -  HISTOP-04-1 0-0247 


08-Sep-2010 

Dear  Professor  Roysam, 

Manuscript  ID  HISTOP-04- 10-0247  entitled  "Cell-based  Quantification  of 
Molecular  Biomarkers  in  Histopathology  Specimens",  which  you  submitted  to 
Histopathology,  has  been  reviewed.  The  comments  of  the  reviewer(s)  are 
included  at  the  bottom  of  this  e-mail. 

The  reviewer(s)  have  recommended  publication,  but  suggest  some  minor 
revisions  to  your  manuscript.  Therefore,  I  invite  you  to  respond  to  the 
reviewer(s)'  comments  and  revise  your  manuscript  accordingly. 

To  revise  your  manuscript,  log  into  http://mc.manuscriptcentral.com/histop 
and  enter  your  Author  Center,  where  you  will  find  your  manuscript  title 
listed  under  "Manuscripts  with  Decisions."  Under  "Actions,"  click  on 
"Create  a  Revision".  Your  manuscript  number  has  been  appended  to  denote  a 
revision. 

You  will  be  unable  to  make  your  revisions  on  the  originally-submitted 
version  of  the  manuscript.  Instead,  revise  your  manuscript  using  a 
word-processing  program  and  save  it  onto  your  computer.  Please  also 
highlight  the  changes  to  your  manuscript  within  the  document  by  using  the 
track  changes  mode  in  MS  Word  or  by  using  bold  or  coloured  text. 

Once  the  revised  manuscript  is  prepared,  you  can  upload  it  and  submit  it 
through  your  Author  Center. 

When  submitting  your  revised  manuscript,  you  will  be  able  to  respond  to  the 
comments  made  by  the  reviewer(s)  in  the  space  provided.  You  can  use  this 
space  to  document  any  changes  you  make  to  the  original  manuscript.  In  order 
to  expedite  the  processing  of  the  revised  manuscript,  please  be  as  specific 
as  possible  in  your  response  to  the  reviewer(s). 

IMPORTANT:  Your  original  files  are  available  to  you  when  you  upload  your 
revised  manuscript.  Please  delete  any  redundant  files  before  completing  the 
submission. 

Because  we  are  trying  to  facilitate  timely  publication  of  manuscripts 
submitted  to  Histopathology,  your  revised  manuscript  should  be  submitted 
within  30  days.  If  it  is  not  possible  for  you  to  submit  your  revision 
within  a  reasonable  amount  of  time,  we  may  have  to  consider  your  paper  as  a 
new  submission. 

Once  again,  thank  you  for  submitting  your  manuscript  to  Histopathology  and 
I  look  forward  to  receiving  your  revision. 

With  kind  regards. 

Yours  sincerely, 

Professor  Michael  Wells 
Editor  in  Chief,  Histopathology 


Reviewer(s)'  Comments  to  Author: 

Referee:  1 

Comments  to  the  Author 

In  the  study,  the  authors  was  able  to  demonstrate  the  application  of 
FARSIGHT  software  to  analyze  microscopic  sections  using  spectral 
segmentation  method,  which  could  achieve  a  result  similar  to  human  expert 
analysis  Table  A.1 .  The  method  could  be  useful  in  qualification  and 
quantification  of  immunohistochemistry  and  IF  signals,  and  might  offer  more 
objective  assessment  of  a  test  requiring  quantification  of  an  immuno-signal, 
like  cerbB2  and  grading  IF  in  renal  &  skin  Bx.  The  pictures  are  of  good 
quality  and  are  illustrative. 

Some  issues  need  to  be  addressed,  listed  as  follows: 

1 .  It  appears  that  FARSIGHT,  like  most  other  image  analysis  tools,  is  a 
proprietary  product  and  the  algorithm  of  analysis  is  not  known  to  many 
users. 

2.  Analysis  using  spectral  segmentation  requires  "significant  contrast" 
between  cellular  compartments  and  the  method  could  not  directly  apply  to 
conventional  HE  and  many  histochemical  sections. 

3.  Much  manual  preparations  is  required  in  a)  special  staining  and  b) 
selection  of  image. 

4.  It  appears  that  the  current  experimental  application  is  concentrated  with 
assessment  of  SINGLE  cell  type,  whereas  in  most  diseases,  there  will  be 
multiple  cell  types  that  may  express  the  same  markers  to  a  different  extent. 

5.  How  easy  or  feasible  is  this  technique  in  assessing  whole  slides?  Would 
it  be  too  tedious? 

6.  How  about  markers  showing  granular  staining  (e.g.  synaptophysin?) 


